Project Gigapower · 02 — Analysis & Recommendations¶
🗺️ Purpose - what we are really solving for¶
Board brief: *“Where should we commit our next $500 M to secure 15 GWh of
battery‑cell output by 2029 — with minimum geopolitical risk and maximum supply‑chain optionality?”*
This notebook answers two concrete questions
| # | Key question | Decision metric produced |
|---|---|---|
| 1 | Which countries give us the best risk‑adjusted ROI on a 10‑year horizon? | Gigafactory Attractiveness Index — one number per country |
| 2 | How sensitive is that ranking to board‑level trade‑offs? (e.g. “safety first” vs. “cost first”) |
Scenario panels — Risk‑heavy & Cost‑heavy Top‑10s |
Scope parameters
- Capacity target: 15 GWh p.a. cell output (phase‑1), expandable to 50 GWh.
- Capital envelope: US $500 M equity; local incentives to offset up to 30 %.
- Time‑to‑ground‑breaking: 24 months → shortlist must have shovel‑ready industrial land.
- Supply‑chain resilience: Preference for in‑country critical‑mineral availability or duty‑free import route.
- Risk tolerance: Must sit above global median on composite governance index.
Deliverables at the end of this analysis
- Data‑backed shortlist of five priority markets with pillar‑by‑pillar justification.
- Interactive weight‑slider tool for the Steering Committee to test their own assumptions live.
- 60‑day diligence workplan outlining site visits, JV outreach, and secondary‑research tasks (tax, tariffs, sanctions, ESG).
📦 Data Provenance¶
Source notebook: 01_Data_Acquisition_and_Cleaning.ipynb
Final dataset loaded here: model_ready_data.csv
| Feed | Provider | Example metrics |
|---|---|---|
| World Bank | WDI | GDP, population, governance |
| UN Comtrade | UN Stats | Battery‑precursor trade flows |
| ILO STAT | ILO | Unit labour cost index |
| BGS | UK British Geological Survey | Critical‑mineral reserves |
| ACLED | Armed Conflict Location & Event Data | Political‑violence frequency |
All sources were cleaned, harmonised and reshaped into a country‑year panel (2010 – 2023) with ISO‑3 codes and consistent units.
🔍 Analytical Workflow¶
Exploratory scan
- Histograms & pair‑plots confirm variable ranges, detect outliers.
Pillar construction — six “higher‑is‑better” pillars
market_score · cost_score · mineral_index · lpi_score · industry_pct_gdp · risk_score- Each pillar z‑scored (μ = 0, σ = 1) to neutralise variance.
Weighted ‘Gigafactory Attractiveness Index’
- Baseline weights 25 % Market / 25 % Risk / 3 × 15 % / 5 % Industry.
- Stored per country‑year; averaged to 2010‑‑23 country means.
Country segmentation
- K‑Means (optimal k = 2) splits the universe into
“Safe Mature Hubs” vs. “Risk‑Weighted Frontiers”.
- K‑Means (optimal k = 2) splits the universe into
Visual synthesis
- 2 × 2 scatter (Attractiveness × Risk) with cluster colouring & bubble = market size.
- Tornado charts break down each finalist’s index by pillar share.
- Interactive slider sheet lets executives re‑weight pillars live in‑meeting.
Sensitivity analysis
- Risk‑heavy (40 % Risk) & Cost‑heavy (30 % Cost) scenarios plus Top‑10 bar panels.
Recommendation set
- Australia · Canada · United States · Germany · Japan
- Actionable next‑step workplan & secondary‑research checklist (tax, tariffs, ESG, sanctions).
The remainder of this notebook walks through each step, culminating in a data‑backed shortlist and a 60‑day diligence roadmap for the Steering Committee.
Data Dictionary: Final Analytical Dataset¶
This section describes each variable in our final master_df DataFrame. The dataset contains 80 countries for the years 2010-2022.
Key Identifiers¶
country: The name of the country, harmonized across all datasets.year: The year of observation.
Pillar 1: Market & Economic Opportunity¶
gdp_usd: Gross Domestic Product in current U.S. dollars. (Source: World Bank)gdp_growth_pct: Annual percentage growth rate of GDP. (Source: World Bank)population: Total population. (Source: World Bank)fdi_net_inflows_pct_gdp: Foreign Direct Investment net inflows as a percentage of GDP. (Source: World Bank)manufacturing_pct_gdp: The value added by the manufacturing sector as a percentage of GDP. (Source: World Bank)access_to_electricity_pct: Percentage of the population with access to electricity. (Source: World Bank)electric_power_consumption_kwh_pc: Electric power consumption in kWh per capita. (Source: World Bank)gross_capital_formation_pct_gdp: A measure of net new investment in the economy. (Source: World Bank)total_imports_usd: Total annual value in U.S. dollars of imported battery-related goods. (Source: UN Comtrade)
Pillar 2: Cost Competitiveness¶
wage_usd: Average monthly manufacturing wage in U.S. dollars. (Source: ILOSTAT, World Bank)inflation_pct: Annual inflation rate of consumer prices. (Source: World Bank)
Pillar 3: Supply Chain & Manufacturing Readiness¶
industry_pct_gdp: The value added by the entire industrial sector as a percentage of GDP. (Source: World Bank)lpi_score: The Logistics Performance Index, scoring trade and transport infrastructure quality. (Source: World Bank)cobalt,_mine,graphite,lithium_minerals,manganese_ore,nickel,_mine: Annual mine production for each critical mineral in metric tonnes. (Source: BGS)
Pillar 4: Governance & Geopolitical Risk¶
political_stability_est,control_of_corruption_est,rule_of_law_est: World Governance Indicator scores representing institutional quality and stability. (Source: World Bank)total_disorder_events: Total annual count of political violence and demonstration events. (Source: ACLED)acled_covered: A flag indicating if thetotal_disorder_eventsscore is based on a year where ACLED provides coverage. (Source: ACLED, derived)
1. Setup: Loading Libraries and Configuration¶
This first cell imports all the necessary Python libraries for our analysis, including pandas for data manipulation, plotly and seaborn for visualization, and scikit-learn for our machine learning models (PCA and K-Means).
# 02_Analysis_and_Recommendations.ipynb
# -------------------------------------------------------------
import plotly.io as pio
pio.renderers.default = "notebook"
# --- Core Numerics & Stats ---
import pandas as pd
from pathlib import Path
import numpy as np
import scipy.stats as stats
from scipy.stats import mode
from scipy.stats import gaussian_kde
# --- Visualization ---
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from typing import Dict
# --- Notebook Settings ---
# This command ensures that plots appear directly in the notebook
%matplotlib inline
# This is the magic line for high-resolution plots (e.g., for Retina displays)
%config InlineBackend.figure_format = 'retina'
# --- Machine Learning / Modeling ---
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
# --- House-Keeping ---
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
# --- Display & Aesthetics ---
pd.set_option("display.float_format", "{:,.2f}".format)
px.defaults.template = "plotly_white" # clean, no gridlines
sns.set_style("white") # same for seaborn
plt.rcParams.update({ # grid-free matplotlib default
"axes.grid": False,
"figure.figsize": (10, 6)
})
print("✅ Libraries imported & global aesthetics configured.")
✅ Libraries imported & global aesthetics configured.
2. Load the Analysis-Ready Data¶
This step loads the final, clean, and imputed dataset we created in our first notebook. This master_df DataFrame will be the foundation for all the analysis and modeling to follow.
# --- Action: Load the analysis-ready dataset (privacy-safe printout) ---
file_path = Path("model_ready_data.csv")
print(f"Loading dataset: {file_path.name}")
try:
master_df = pd.read_csv(file_path, low_memory=False)
# Core diagnostics
n_rows, n_cols = master_df.shape
n_countries = master_df["country"].nunique()
mem_usage_mb = master_df.memory_usage(deep=True).sum() / 1_048_576
print(f"✅ Loaded successfully — {n_rows:,} rows × {n_cols} columns")
print(f" • Countries represented : {n_countries}")
print(f" • Memory footprint : {mem_usage_mb:,.1f} MB")
# Integrity check
if master_df.duplicated(subset=["country", "year"]).any():
print("⚠️ Duplicate country-year records detected.")
else:
print("👍 Each country-year combination is unique.")
display(master_df.head())
except FileNotFoundError:
print("❌ File not found. Ensure 'model_ready_data.csv' sits in the notebook folder.")
except Exception as e:
print(f"❌ An unexpected error occurred: {e}")
Loading dataset: model_ready_data.csv ✅ Loaded successfully — 1,015 rows × 25 columns • Countries represented : 80 • Memory footprint : 0.2 MB 👍 Each country-year combination is unique.
| country | year | gdp_usd | gdp_growth_pct | population | fdi_net_inflows_pct_gdp | manufacturing_pct_gdp | access_to_electricity_pct | electric_power_consumption_kwh_pc | gross_capital_formation_pct_gdp | ... | cobalt,_mine | graphite | lithium_minerals | manganese_ore | nickel,_mine | political_stability_est | control_of_corruption_est | rule_of_law_est | total_disorder_events | acled_covered | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Albania | 2010 | 11,926,926,615.80 | 3.71 | 2,913,021.00 | 9.14 | 5.45 | 99.60 | 1,943.34 | 28.43 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 1,954.00 | -0.19 | -0.53 | -0.39 | 0 | False |
| 1 | Albania | 2015 | 11,386,853,113.02 | 2.22 | 2,880,703.00 | 8.69 | 5.67 | 100.00 | 2,098.10 | 24.41 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 6,280.00 | 0.34 | -0.55 | -0.32 | 0 | False |
| 2 | Albania | 2016 | 11,861,199,830.84 | 3.31 | 2,876,101.00 | 8.81 | 5.68 | 99.90 | 1,994.37 | 24.37 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 3,952.00 | 0.34 | -0.47 | -0.32 | 0 | False |
| 3 | Albania | 2017 | 13,019,726,211.74 | 3.80 | 2,873,457.00 | 7.86 | 6.16 | 99.90 | 2,145.15 | 24.58 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 5,323.00 | 0.37 | -0.48 | -0.41 | 0 | False |
| 4 | Albania | 2018 | 15,379,509,891.72 | 4.02 | 2,866,376.00 | 7.83 | 6.71 | 100.00 | 2,276.74 | 25.92 | ... | 0.00 | 0.00 | 0.00 | 0.00 | 4,204.00 | 0.37 | -0.55 | -0.41 | 239 | True |
5 rows × 25 columns
2.1. Final Data Verification: Checking for Missing Values¶
- This code will count the number of missing values (NaNs) in every column of your
master_dfand report only the columns that contain them.
# --- Action: Audit remaining missing values ---------------------------------
if "master_df" in locals():
# Count & percent missing per column
miss_ct = master_df.isna().sum()
miss_pct = miss_ct.div(len(master_df)).mul(100).round(1)
# Keep only columns with at least one NaN
miss_tbl = (
pd.DataFrame({"missing_count": miss_ct, "missing_pct": miss_pct})
.query("missing_count > 0")
.sort_values("missing_pct", ascending=False)
)
print("--- Missing Values Report ---")
if miss_tbl.empty:
print("✅ Excellent! No missing values found in the dataset.")
else:
print("⚠️ The following columns still contain missing values (as expected for volatile indicators):")
display(miss_tbl)
print("We’ll decide whether to impute, drop, or model around these during the pillar-scoring step.")
else:
print("❌ 'master_df' not found. Please load the data first.")
--- Missing Values Report --- ✅ Excellent! No missing values found in the dataset.
# --- EDA Step 1: Descriptive Statistics Snapshot --------------------------------
print("--- EDA Step 1: Descriptive Statistics Snapshot ---")
numeric_cols = master_df.select_dtypes(include=[np.number]).columns
summary_stats = (
master_df.loc[:, numeric_cols]
.describe(percentiles=[0.25, 0.5, 0.75])
.T # tall format is easier to scan
.rename(columns={"25%": "p25", "50%": "median", "75%": "p75"})
.assign(range=lambda df: df["max"] - df["min"])
.round(2)
)
display(summary_stats)
--- EDA Step 1: Descriptive Statistics Snapshot ---
| count | mean | std | min | p25 | median | p75 | max | range | |
|---|---|---|---|---|---|---|---|---|---|
| year | 1,015.00 | 2,016.38 | 3.92 | 2,010.00 | 2,013.00 | 2,016.00 | 2,020.00 | 2,023.00 | 13.00 |
| gdp_usd | 1,015.00 | 980,636,615,212.94 | 2,804,863,988,649.44 | 4,054,730,077.58 | 46,589,833,347.14 | 221,985,621,537.50 | 598,672,532,690.49 | 27,720,709,000,000.00 | 27,716,654,269,922.42 |
| gdp_growth_pct | 1,015.00 | 2.98 | 3.79 | -17.82 | 1.47 | 2.98 | 4.99 | 24.62 | 42.44 |
| population | 1,015.00 | 65,073,117.38 | 200,642,978.32 | 318,041.00 | 4,787,253.50 | 10,536,632.00 | 46,883,847.00 | 1,438,069,596.00 | 1,437,751,555.00 |
| fdi_net_inflows_pct_gdp | 1,015.00 | 7.01 | 38.48 | -440.13 | 1.42 | 2.81 | 4.81 | 452.22 | 892.35 |
| manufacturing_pct_gdp | 1,015.00 | 14.17 | 5.65 | 3.53 | 10.35 | 13.32 | 17.69 | 37.15 | 33.62 |
| access_to_electricity_pct | 1,015.00 | 98.50 | 5.18 | 31.10 | 99.60 | 100.00 | 100.00 | 100.00 | 68.90 |
| electric_power_consumption_kwh_pc | 1,015.00 | 5,351.58 | 5,999.83 | 157.03 | 2,031.31 | 4,028.62 | 6,686.49 | 54,799.17 | 54,642.15 |
| gross_capital_formation_pct_gdp | 1,015.00 | 22.52 | 5.26 | 10.97 | 19.40 | 22.03 | 24.75 | 53.22 | 42.25 |
| inflation_pct | 1,015.00 | 3.80 | 5.01 | -2.10 | 1.22 | 2.67 | 4.88 | 72.31 | 74.41 |
| total_imports_usd | 1,015.00 | 403,844,031.63 | 1,348,277,495.05 | 3,412.00 | 3,059,114.07 | 23,186,036.00 | 187,005,728.54 | 19,376,702,357.00 | 19,376,698,945.00 |
| wage_usd | 1,015.00 | 1,596.14 | 1,916.37 | 2.63 | 298.11 | 585.79 | 2,532.61 | 8,460.40 | 8,457.77 |
| lpi_score | 1,015.00 | 3.23 | 0.52 | 2.16 | 2.78 | 3.18 | 3.70 | 4.23 | 2.07 |
| industry_pct_gdp | 1,015.00 | 25.86 | 7.46 | 9.97 | 20.59 | 25.21 | 30.03 | 61.73 | 51.76 |
| cobalt,_mine | 1,015.00 | 374.72 | 1,402.01 | 0.00 | 0.00 | 0.00 | 0.00 | 10,237.00 | 10,237.00 |
| graphite | 1,015.00 | 13,857.93 | 104,359.57 | 0.00 | 0.00 | 0.00 | 0.00 | 1,800,000.00 | 1,800,000.00 |
| lithium_minerals | 1,015.00 | 14,549.36 | 131,947.08 | 0.00 | 0.00 | 0.00 | 0.00 | 2,021,498.00 | 2,021,498.00 |
| manganese_ore | 1,015.00 | 346,966.96 | 1,690,468.68 | 0.00 | 0.00 | 0.00 | 0.00 | 20,000,000.00 | 20,000,000.00 |
| nickel,_mine | 1,015.00 | 24,263.08 | 106,378.80 | 0.00 | 0.00 | 0.00 | 0.00 | 1,579,000.00 | 1,579,000.00 |
| political_stability_est | 1,015.00 | 0.20 | 0.79 | -2.81 | -0.38 | 0.34 | 0.86 | 1.62 | 4.43 |
| control_of_corruption_est | 1,015.00 | 0.39 | 1.04 | -1.32 | -0.51 | 0.21 | 1.34 | 2.40 | 3.73 |
| rule_of_law_est | 1,015.00 | 0.47 | 0.95 | -1.30 | -0.37 | 0.43 | 1.34 | 2.12 | 3.43 |
| total_disorder_events | 1,015.00 | 818.38 | 2,615.69 | 0.00 | 0.00 | 0.00 | 340.00 | 23,311.00 | 23,311.00 |
Key Insights from the Statistical Snapshot 💡¶
This summary table provides our first look at the characteristics of our data across 1,015 country-year observations. A few key patterns are immediately apparent:
Significant Scale & Skew: Many variables, such as
gdp_usd,population,total_imports_usd, and all the mineral production metrics, are highly skewed. We can see this because their mean is much larger than their median (the50%or "p50" mark). This indicates that a few very large countries or top producers are pulling the average up significantly.Concentrated Mineral Production: The mineral columns (e.g.,
graphite,lithium_minerals) have a median of 0. This confirms our earlier understanding that production is highly concentrated in only a handful of countries, with most nations in our dataset having zero output.Presence of Outliers: Some indicators, particularly
fdi_net_inflows_pct_gdpandinflation_pct, show an extremely wide range between theirminandmaxvalues. This signals the presence of significant outliers, likely representing small economies or countries undergoing economic shocks.
These initial findings are crucial. They tell us that simply looking at the "average" country can be misleading. Our next step of visualizing the distributions with histograms will be very important for better understanding this skewness and the impact of outliers.
3.2 EDA Step 2: Univariate Distributions¶
- We will now create histograms for a selection of our most important indicators from each pillar. This allows us to visually inspect their distribution, confirming patterns like skewness and identifying potential outliers.
# --- EDA Step 2 (interactive, with bar separation & enriched KDE hovers) -----
cols_to_plot = [
"gdp_usd", "gdp_growth_pct", "total_imports_usd",
"wage_usd", "inflation_pct", "manufacturing_pct_gdp",
"lpi_score", "political_stability_est", "total_disorder_events"
]
fig = make_subplots(
rows=3, cols=3,
subplot_titles=[col.replace("_", " ").title() for col in cols_to_plot]
)
for idx, col in enumerate(cols_to_plot, start=1):
r, c = divmod(idx - 1, 3)
r += 1; c += 1
data = master_df[col].dropna()
if data.empty:
continue
# Histogram (bars with white outline for separation)
fig.add_trace(
go.Histogram(
x=data,
nbinsx=30,
histnorm="probability density",
marker=dict(
color="#0B0055",
line=dict(color="white", width=1)
),
opacity=0.95,
showlegend=False
),
row=r, col=c
)
# KDE (“top line”) with enriched hover
if data.nunique() > 1:
kde = gaussian_kde(data)
x_grid = np.linspace(data.min(), data.max(), 250)
y_grid = kde(x_grid)
# Summary statistics repeated so each point can display them
mean, median, std = data.mean(), data.median(), data.std(ddof=0)
customdata = np.column_stack([
np.full_like(x_grid, mean),
np.full_like(x_grid, median),
np.full_like(x_grid, std)
])
fig.add_trace(
go.Scatter(
x=x_grid,
y=y_grid,
mode="lines",
line=dict(color="#F86302", width=2),
customdata=customdata,
hovertemplate=(
"<b>%{x:.2f}</b><br>"
"Density: %{y:.4f}<br>"
"Mean: %{customdata[0]:.2f}<br>"
"Median: %{customdata[1]:.2f}<br>"
"Std Dev: %{customdata[2]:.2f}<extra></extra>"
),
showlegend=False,
name=f"KDE: {col}"
),
row=r, col=c
)
fig.update_xaxes(title_text="", row=r, col=c)
fig.update_yaxes(title_text="", row=r, col=c)
fig.update_layout(
height=900,
width=1200,
title_text="Distributions of Key Indicators (Interactive)",
title_x=0.5,
template="plotly_white",
bargap=0.05,
margin=dict(t=80),
hovermode="x unified" # unified hover improves side‑by‑side readability
)
fig.show()
📊 Distributions of Key Indicators – What Jumps Out?¶
1. Economic Scale & Trade¶
gdp_usd&total_imports_usd- Heavily right-skewed → a handful of very large economies dominate the axis, while most countries cluster near the origin.
- Implication: log-transform before PCA to avoid outsized influence from the U.S./China tier.
2. Growth & Inflation Dynamics¶
gdp_growth_pct- Roughly bell-shaped around ~3 %, but tails extend to –18 % and +25 % → captures crisis rebounds & commodity booms.
inflation_pct- Long right tail (up to ~70 %) shows sporadic high-inflation episodes; bulk of countries sit below 10 %.
- Implication: winsorise or cap extreme inflation outliers to stabilise variance.
3. Cost Competitiveness¶
wage_usd- Skewed right with a sharp spike under $1 000 → signals a clear low-wage cohort; thin tail up to $8 k.
- Implication: segmenting by wage quintiles will cleanly separate cost-advantaged markets.
4. Industrial & Logistics Readiness¶
manufacturing_pct_gdp- Mild right skew; majority between 10–20 %, with a secondary bump >25 % (classic “factory economies”).
lpi_score- Fairly symmetric 2.2–4.2 range; most countries hover around the global mean (~3.2).
- Implication: enough dispersion to let the Logistics pillar differentiate markets.
5. Governance & Risk¶
political_stability_est- Bimodal feel: cluster around –0.5 (moderate risk) and +0.7 (stable). Very few fall below –2.5 (failed-state zone).
total_disorder_events- Classic “long-tail” distribution—zero events for many country-years, but a handful exceed 20 k events.
- Implication: log(x + 1) or percentile ranking recommended before combining with governance scores.
Overall takeaway: Several key variables are highly skewed; applying log/winsorisation before PCA will prevent extreme values from dominating principal components and clustering results.
3.3 EDA Step 3: Correlation Structure¶
To understand the relationships between our variables, we will now generate a correlation heatmap. This advanced version is designed for maximum clarity by highlighting only the most important connections.
This visualization has two key features to make it clean and insightful:
- Triangular Layout: To reduce redundancy (since the matrix is symmetrical), the plot only shows the lower triangle. This makes it easier to read.
- Selective Annotations: For an even cleaner look, instead of displaying every number, we will only annotate the most significant correlations—those with a value greater than
0.35or less than-0.35. This powerful technique immediately draws our attention to the strongest positive and negative relationships in the data.
This map is crucial for spotting multicollinearity and forming hypotheses before we build our PCA models. 🔗
# --- Correlation Heatmap | Annotate ±0.40 and beyond -------------------------
num_cols = master_df.select_dtypes(include=[np.number]).columns
corr = master_df[num_cols].corr().round(2)
# Mask the upper triangle → white space on the right
corr_visible = corr.mask(np.triu(np.ones_like(corr, bool), k=1))
fig = go.Figure(
data=go.Heatmap(
z=corr_visible.values,
x=corr_visible.columns,
y=corr_visible.index,
colorscale="RdBu",
zmin=-1, zmax=1,
hovertemplate="%{y} vs %{x}<br>ρ = %{z}<extra></extra>"
)
)
# Prepare annotation lists
annot_x, annot_y, annot_txt, annot_col = [], [], [], []
for i in range(corr_visible.shape[0]):
for j in range(i): # lower triangle only
val = corr_visible.iat[i, j]
if val >= 0.35 or val <= -0.35: # annotate both strong pos & neg
annot_x.append(corr_visible.columns[j])
annot_y.append(corr_visible.index[i])
annot_txt.append(f"{val:.2f}")
# Light text on dark-blue (positive), dark text on red (negative)
annot_col.append("white" if val >= 0.35 else "black")
fig.add_trace(
go.Scatter(
x=annot_x, y=annot_y, text=annot_txt,
mode="text",
textfont=dict(size=9, color=annot_col)
)
)
fig.update_layout(
title="Correlation Heatmap – Numeric Indicators (2010-23)",
title_x=0.5,
width=950, height=750,
template="plotly_white",
xaxis_showgrid=False, yaxis_showgrid=False,
margin=dict(t=80, l=120)
)
fig.update_xaxes(tickangle=45)
fig.show()
🔍 Correlation Heatmap — Quick Takeaways¶
1. Strong Positive Clusters ( |ρ| ≥ 0.60)¶
Governance Trio
rule_of_law_est↔control_of_corruption_est(0.96)rule_of_law_est↔political_stability_est(0.75)control_of_corruption_est↔political_stability_est(0.78)- Implication: the three governance metrics track the same latent construct; we can safely combine or reduce them in PCA.
Economic Scale
total_imports_usd↔gdp_usd(0.70)population↔gdp_usd(0.55)- Implication: import volumes largely reflect overall market size—consider log-scaling to temper their weight.
Cost & Logistics
lpi_score↔wage_usd(0.70)lpi_score↔industry_pct_gdp(0.52)- Implication: more advanced logistics systems tend to sit in higher-wage, highly industrialised economies—clear cost vs. efficiency trade-off.
Critical-Minerals Cluster
manganese_ore↔nickel_mine(0.79)graphite↔lithium_minerals(0.74)- Implication: certain minerals co-occur; collapsing these into a single “mineral abundance” factor will avoid double-counting in PCA.
2. Noticeable Negative Links ( ρ ≤ -0.40)¶
Governance vs. Political Violence
political_stability_estshows a moderate inverse relationship withtotal_disorder_events(~-0.37, just shy of the -0.40 cut-off).- Implication: while directionally correct, the magnitude suggests we need both variables to capture risk fully.
No other correlations breach the –0.40 threshold, indicating that strong inverse relationships are rare in this dataset.
3. Strategic Takeaways for Modelling¶
- Dimensionality Reduction – Merge or PCA-compress the highly collinear governance and mineral blocks.
- Trade-off Narrative – The positive link between wages and logistics will underpin a cost-vs-efficiency quadrant in the final 2 × 2.
- Retain Risk Variables – Because negative correlations are modest, instability metrics still add orthogonal information and should feed directly into the Risk pillar.
3.4 EDA Step 4: Pairwise Deep-Dives with Scatter Plots¶
Let's create a few targeted scatter plots to explore some of the core trade-offs and relationships in our data. Scatter plots are excellent for visually confirming the strength and direction of a relationship between two specific variables.
We will produce two targeted scatter plots that capture the most strategic trade-offs for a gigafactory site decision:
| Pair | Pillars Involved | Strategic Question Addressed |
|---|---|---|
wage_usd × lpi_score |
Cost Competitiveness ↔ Supply-Chain Readiness | How does logistics quality change as labour costs rise? |
gdp_usd × total_imports_usd |
Market Opportunity | Are the biggest economies also the largest import hubs, or are there “gateway” trade magnets punching above their GDP weight? |
Why just these two?
- Both show a meaningful, but not redundant, correlation (|ρ| ≈ 0.70) in the heat-map—strong enough to warrant visual confirmation, yet still rich in potential outliers.
- Together they address the core executive narratives:
- Cost vs. Efficiency — balancing low wages against robust logistics.
- Scale vs. Openness — gauging whether market size aligns with import intensity.
Other high correlations—such as the governance trio or mineral co-occurrences—are intra-pillar and will be compressed later via PCA, so additional scatter plots would add minimal incremental insight at this exploratory stage.
Deep-Dive: The Cost vs. Efficiency Trade-off¶
Cost vs. Efficiency — Three Economic Layers in a Single Frame¶
- Feature engineering: create a fresh
gdp_per_capita(GDP ÷ population) to proxy overall economic maturity. - Plot design:
- x‑axis —
lpi_score(Logistics Performance Index). - y‑axis —
wage_usdon a log scale to corral its long, right‑skewed tail.
- x‑axis —
- Chromatic cue: colour points by
gdp_per_capita(log‑space) so high‑income economies literally radiate on the chart. - Trendline: an OLS fit (in log‑wage space) quantifies how steeply labour costs climb with incremental gains in logistics quality.
Together these layers expose the pivotal trade‑off: how many extra dollars in monthly wages “buy” a unit of logistical reliability?
# --- Feature Engineering -----------------------------------------------------
master_df["gdp_per_capita"] = master_df["gdp_usd"] / master_df["population"]
df_plot = (master_df
.loc[master_df["wage_usd"] > 0,
["country", "year", "lpi_score", "wage_usd", "gdp_per_capita"]]
.assign(gdp_pc_log=lambda d: np.log10(d["gdp_per_capita"])))
import plotly.graph_objects as go, statsmodels.api as sm, numpy as np
fig = go.Figure()
# Scatter markers
fig.add_trace(
go.Scatter(
x=df_plot["lpi_score"],
y=df_plot["wage_usd"],
mode="markers",
showlegend=False, # remove redundant legend
marker=dict(
color=df_plot["gdp_pc_log"],
colorscale="Viridis",
showscale=True,
colorbar=dict(
title="log₁₀ GDP pc",
tickvals=[np.log10(v) for v in (1_000, 10_000, 50_000)],
ticktext=["$1k", "$10k", "$50k"],
x=1.02 # nudge away from plot edge
),
size=7,
opacity=0.85,
line=dict(width=0.5, color="white")
),
customdata=df_plot[["country", "year", "gdp_per_capita"]],
hovertemplate="<b>%{customdata[0]}</b> (%{customdata[1]:.0f})<br>" +
"LPI %{x:.2f}<br>" +
"Wage $%{y:,.0f}<br>" +
"GDP pc $%{customdata[2]:,.0f}<extra></extra>"
)
)
# OLS line (fit in log‑wage space)
X = sm.add_constant(df_plot["lpi_score"])
model = sm.OLS(np.log(df_plot["wage_usd"]), X).fit()
x_line = np.linspace(df_plot["lpi_score"].min(), df_plot["lpi_score"].max(), 100)
y_line = np.exp(model.params[0] + model.params[1] * x_line)
fig.add_trace(
go.Scatter(
x=x_line,
y=y_line,
mode="lines",
line=dict(color="#F86302", width=3, dash="dash"),
showlegend=False,
hovertemplate="<b>OLS trend</b><br>Slope %{customdata[0]:.2f}<br>" +
"R² %{customdata[1]:.2f}<extra></extra>",
customdata=np.column_stack([np.full_like(x_line, model.params[1]),
np.full_like(x_line, model.rsquared)])
)
)
fig.update_layout(
title="Cost vs Efficiency: Logistics Quality vs. Manufacturing Wage",
title_x=0.5,
xaxis_title="Logistics Performance Index (higher = better)",
yaxis_title="Monthly Manufacturing Wage (USD, log scale)",
yaxis_type="log",
template="plotly_white",
hovermode="x unified",
height=600, width=900,
margin=dict(t=80)
)
fig.show()
Key Insights — Cost vs Efficiency Scatter¶
- Steep, monotonic climb: the OLS slope (~1.66) confirms that each incremental gain in logistics quality commands a materially higher wage bill.
- Inflection near LPI ≈ 3.7: beyond this hinge, wages accelerate faster than logistics improves—an implied efficiency premium.
- Bimodal basin:
- Value‐sweet‑spot: LPI 2.6–3.3 with wages < $500 — lean markets where upgrading ports/roads could yield outsized returns.
- Premium quadrant: LPI > 3.7 with wages > $2 000 — turnkey, low‑risk hubs for investors prioritising supply‑chain certainty.
- Emergent outliers: a handful of countries deliver LPI > 3.3 at wages well south of $1 000 — hidden gems for greenfield entrants.
- Colour gradient narrates prosperity: the brightest (high GDP pc) dots crowd the premium quadrant, underscoring how national wealth co‑evolves with both wages and infrastructure.
Scale vs Openness — Does GDP Alone Explain Import Appetite?¶
- Objective: test whether bulk economic output (GDP) necessarily drives merchandise imports, or whether “gateway” economies import disproportionately to re‑export.
- Axes:
- x‑axis —
gdp_usd(log scale). - y‑axis —
total_imports_usd(log scale).
- x‑axis —
- Chromatic cue: colour by population (log‑space) to flag mega‑markets versus boutique yet trade‑heavy states.
- Trendline: an OLS fit in log‑log space reveals the elasticity of imports with respect to GDP.
df_trade = (master_df
.loc[(master_df["gdp_usd"] > 0) & (master_df["total_imports_usd"] > 0),
["country", "year", "gdp_usd", "total_imports_usd", "population"]]
.assign(pop_log=lambda d: np.log10(d["population"])))
fig = go.Figure()
# Scatter markers
fig.add_trace(
go.Scatter(
x=df_trade["gdp_usd"],
y=df_trade["total_imports_usd"],
mode="markers",
showlegend=False,
marker=dict(
color=df_trade["pop_log"],
colorscale="Cividis",
showscale=True,
colorbar=dict(
title="log₁₀ Population",
tickvals=[6, 7, 8, 9],
ticktext=["1 M", "10 M", "100 M", "1 B"],
x=1.02
),
size=7,
opacity=0.85,
line=dict(width=0.5, color="white")
),
customdata=df_trade[["country", "year", "population"]],
hovertemplate="<b>%{customdata[0]}</b> (%{customdata[1]:.0f})<br>" +
"GDP $%{x:,.0f}<br>" +
"Imports $%{y:,.0f}<br>" +
"Population %{customdata[2]:,}<extra></extra>"
)
)
# OLS in log–log space
X = sm.add_constant(np.log10(df_trade["gdp_usd"]))
model = sm.OLS(np.log10(df_trade["total_imports_usd"]), X).fit()
x_line = np.linspace(df_trade["gdp_usd"].min(), df_trade["gdp_usd"].max(), 100)
y_line = 10 ** (model.params[0] + model.params[1] * np.log10(x_line))
fig.add_trace(
go.Scatter(
x=x_line,
y=y_line,
mode="lines",
line=dict(color="#F86302", width=3, dash="dash"),
showlegend=False,
hovertemplate="<b>OLS trend</b><br>" +
"Elasticity %{customdata[0]:.2f}<br>" +
"R² %{customdata[1]:.2f}<extra></extra>",
customdata=np.column_stack([np.full_like(x_line, model.params[1]),
np.full_like(x_line, model.rsquared)])
)
)
fig.update_layout(
title="Scale vs Openness: GDP vs. Total Merchandise Imports",
title_x=0.5,
xaxis=dict(title="GDP (USD, log scale)", type="log"),
yaxis=dict(title="Total Goods Imports (USD, log scale)", type="log"),
template="plotly_white",
hovermode="x unified",
height=600, width=900,
margin=dict(t=80)
)
fig.show()
Key Insights — Scale vs Openness Scatter¶
- Near‑unitary elasticity: the trendline’s slope (~1.30) implies imports rise more than one‑for‑one with GDP — textbook proportionality.
- Gateway over‑performers (above the line): a tight cadre imports far more than their GDP suggests, signalling re‑export hubs or deep GVC integration.
- Import‑light behemoths (below the line): several mega‑economies under‑import, hinting at protectionist leanings or robust domestic supply chains.
- Population tint clarifies nuance:
- Gold hues (>1 B people) drift below the line — scale without openness.
- Mid‑sized states of varying colours inhabit both sides, proving that openness is orthogonal to headcount.
- Strategic takeaway: market size alone is an imperfect proxy for trade opportunity; overlaying import intensity reveals outsized transit hubs worth courting for regional distribution.
3.5 EDA: Orientation Map – Political Stability¶
Purpose. Before diving into modelling, grounding the reader in where our governance-risk data sits on the globe. A single, interactive choropleth:
- Geographic context – highlights risk hot- and cool-spots at a glance.
- Time slider (2010-23) – shows how stability has evolved, hinting at trend momentum we’ll capture later in the Risk pillar.
- Colour-blind safe palette (Viridis) – ensures every stakeholder can read the map.
- Clean cartography – Robinson projection, subtle land/sea contrast, and white borders yield a slide-ready image.
# --- Choropleth : Political Stability Index (executive‑grade) ----------------
# 1️⃣ Optional tidy‑up: round to two decimals for cleaner hovers
df_map = master_df.copy()
df_map["political_stability_est"] = df_map["political_stability_est"].round(2)
# 2️⃣ Custom diverging scale centred on zero (CVD friendly)
stability_scale = [
[0.00, "#440154"], # deep purple (very unstable)
[0.25, "#31688e"], # teal‑blue
[0.50, "#35b779"], # mint (≈ 0)
[0.75, "#fde725"], # yellow‑lime
[1.00, "#ffcc33"] # warm yellow (very stable)
]
fig = px.choropleth(
df_map,
locations="country",
locationmode="country names",
color="political_stability_est",
animation_frame="year",
color_continuous_scale=stability_scale,
range_color=(-2.5, 2.5),
hover_name="country",
hover_data={"political_stability_est": True},
)
# 3️⃣ Cartographic finesse
fig.update_geos(
projection_type="natural earth", # smoother than Robinson in Plotly
fitbounds="locations", # auto‑zoom to data; trims poles
showcountries=True, countrycolor="white",
showcoastlines=True, coastlinecolor="white",
showland=True, landcolor="#F2F2F2",
showocean=True, oceancolor="#E8F7FF"
)
# 4️⃣ Borders & hover text
fig.update_traces(
marker_line_color="white",
marker_line_width=0.5,
hovertemplate="<b>%{location}</b><br>Stability: %{z:.2f}<extra></extra>"
)
# 5️⃣ Layout polish
fig.update_layout(
width=1280, height=700,
margin=dict(t=80, l=0, r=0, b=0),
title="Political Stability – World Bank Governance Indicators (2010 – 2023)",
title_x=0.5,
template="plotly_white",
coloraxis_colorbar=dict(
title="Political<br>Stability",
tickmode="array",
tickvals=[-2, -1, 0, 1, 2],
ticktext=["–2", "–1", "0", "1", "2"],
lenmode="fraction", len=0.65,
yanchor="middle", y=0.5
),
hovermode="closest" # cleaner than unified for maps
)
fig.show()
4. Data Preparation — From Raw Metrics to Model-Ready Matrix¶
Before we can run PCA or clustering, every numeric indicator must be on a comparable scale:
Correct extreme skew
Right-tail variables such as GDP, imports, wages, population, and mineral tonnage span several orders of magnitude.Action: Apply a log₁₀(x + 1) transform so that a $1 T vs. $10 T economy becomes a difference of 1 unit instead of 9.
Standardise all numeric columns
PCA assumes each feature has mean 0 and variance 1; otherwise, variables with larger raw variance dominate the components.Action: Feed the log-adjusted matrix into
StandardScaler()to produce z-scores (μ ≈ 0, σ ≈ 1).Preserve identifiers & keep a clean copy
The scaled matrix will later be merged back withcountryandyearso we can interpret scores and plot maps.
import numpy as np, pandas as pd, joblib
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer, StandardScaler
from sklearn.impute import SimpleImputer
from scipy.stats import skew
# ---------------------------------------------------------------------------
# Helper ─ stable log10(x+1)
def log10_1p(x):
"""Vector-safe log10(x + 1)."""
return np.log1p(x) / np.log(10)
# ---------------------------------------------------------------------------
# 1. Build the preprocessing pipeline
df_mod = master_df.copy()
id_cols = ["country", "year"]
num_cols = [c for c in df_mod.select_dtypes("number").columns if c not in id_cols]
# ── detect heavy, non-negative skew ( |skew| > 1 )
skew_flags = (
(df_mod[num_cols].apply(skew, nan_policy="omit").abs() > 1)
& (df_mod[num_cols].min() >= 0) # only log cols with no negatives
)
skew_cols = skew_flags[skew_flags].index.tolist()
non_skew = skew_flags[~skew_flags].index.tolist()
log10_tf = FunctionTransformer(log10_1p, feature_names_out="one-to-one")
preprocessor = ColumnTransformer(
transformers=[
("log10",
Pipeline([
("impute", SimpleImputer(strategy="median")),
("log", log10_tf),
("scale", StandardScaler())
]),
skew_cols),
("standard",
Pipeline([
("impute", SimpleImputer(strategy="median")),
("scale", StandardScaler())
]),
non_skew)
],
remainder="drop"
)
# ---------------------------------------------------------------------------
# 2. Fit-transform & persist the artefact
X_prepared = preprocessor.fit_transform(df_mod[num_cols])
df_scaled = pd.DataFrame(X_prepared,
columns=skew_cols + non_skew,
index=df_mod.index)
if "scaled_df" not in globals() and "df_scaled" in globals():
scaled_df = df_scaled # make both names point to the same DataFrame
print("ℹ️ Aliased df_scaled ➜ scaled_df")
joblib.dump(preprocessor, "pca_preprocessor.pkl")
print("✅ Transformer saved to pca_preprocessor.pkl")
# quick QA
display(df_scaled.describe().T.head())
ℹ️ Aliased df_scaled ➜ scaled_df ✅ Transformer saved to pca_preprocessor.pkl
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| gdp_usd | 1,015.00 | 0.00 | 1.00 | -2.14 | -0.78 | 0.08 | 0.63 | 2.75 |
| population | 1,015.00 | -0.00 | 1.00 | -2.28 | -0.64 | -0.17 | 0.73 | 2.80 |
| access_to_electricity_pct | 1,015.00 | 0.00 | 1.00 | -16.08 | 0.18 | 0.24 | 0.24 | 0.24 |
| electric_power_consumption_kwh_pc | 1,015.00 | 0.00 | 1.00 | -3.43 | -0.63 | 0.12 | 0.68 | 2.99 |
| gross_capital_formation_pct_gdp | 1,015.00 | -0.00 | 1.00 | -3.05 | -0.55 | 0.01 | 0.53 | 4.01 |
Scaling check: All numeric features now have μ ≈ 0 and σ ≈ 1, with extreme values compressed after log-transformation. The dataset is ready for pillar-level PCA.
5 · Modelling — Index Construction with PCA¶
5.1 Pillar 1 · Market & Economic Opportunity¶
Objective. Collapse five scale‑oriented indicators into a single, unit‑free market_score.
We apply Principal Component Analysis (PCA) on the z‑scored matrix and then auto‑orient the first component so that larger markets always receive higher scores.
# --- PCA · Pillar 1 : Market & Economic Opportunity (auto‑oriented) ----------
from sklearn.decomposition import PCA
import numpy as np, pandas as pd
print("— PCA on Pillar 1: Market & Economic Opportunity —")
# 1️⃣ Feature bundle for the Market pillar
pillar1_features = [
"gdp_usd",
"gdp_growth_pct",
"population",
"fdi_net_inflows_pct_gdp",
"total_imports_usd",
]
# 2️⃣ Extract the z‑scored columns from the already‑scaled matrix
df_p1 = scaled_df[pillar1_features]
# 3️⃣ Fit PCA (one component)
pca_p1 = PCA(n_components=1, random_state=42)
pc1_raw = pca_p1.fit_transform(df_p1).ravel()
# 4️⃣ Auto‑orient so that the loading on GDP is positive
gdp_loading = pca_p1.components_[0][pillar1_features.index("gdp_usd")]
sign_correction = np.sign(gdp_loading) or 1 # fallback to +1 if 0
market_score = pc1_raw * sign_correction
loadings_oriented = pd.Series(
pca_p1.components_[0] * sign_correction,
index=pillar1_features,
name="loading"
)
# 5️⃣ Append the oriented score to both dataframes
for df in (master_df, scaled_df):
df["market_score"] = market_score
# 6️⃣ Diagnostics
expl_var = pca_p1.explained_variance_ratio_[0]
print(f"✅ PCA complete — PC‑1 captures {expl_var:.1%} of pillar variance.\n")
display(loadings_oriented.sort_values(key=abs, ascending=False).to_frame()
.style.format("{:+0.2f}"))
print("\nTop / bottom market scores:")
display(master_df[["country", "year", "market_score"]]
.sort_values("market_score", ascending=False).head())
display(master_df[["country", "year", "market_score"]]
.sort_values("market_score", ascending=True).head())
— PCA on Pillar 1: Market & Economic Opportunity — ✅ PCA complete — PC‑1 captures 51.5% of pillar variance.
| loading | |
|---|---|
| gdp_usd | +0.60 |
| total_imports_usd | +0.56 |
| population | +0.54 |
| fdi_net_inflows_pct_gdp | -0.16 |
| gdp_growth_pct | -0.08 |
Top / bottom market scores:
| country | year | market_score | |
|---|---|---|---|
| 161 | China | 2022 | 4.41 |
| 987 | United States | 2023 | 4.08 |
| 159 | China | 2020 | 4.07 |
| 160 | China | 2021 | 4.06 |
| 157 | China | 2018 | 4.02 |
| country | year | market_score | |
|---|---|---|---|
| 599 | Malta | 2018 | -4.53 |
| 602 | Malta | 2021 | -4.34 |
| 598 | Malta | 2017 | -4.06 |
| 212 | Cyprus | 2019 | -3.80 |
| 591 | Malta | 2010 | -3.50 |
Interpreting the Market & Economic Opportunity PCA¶
| Diagnostic | Insight |
|---|---|
| Explained variance = 51.5 % | A single component captures just over half of the variability across five input metrics—an efficient compression for such heterogeneous scale indicators. |
| Loadings (all positive**) | • gdp_usd, total_imports_usd, population dominate (≈ 0.54 – 0.60), confirming that PC‑1 is a market‑mass axis.• Modest weights on fdi_net_inflows_pct_gdp and gdp_growth_pct add nuance—elevating countries whose investment inflows or recent growth exceed what raw size alone would predict. |
| Top scorers (China, United States) | Mega‑economies sit at the positive extreme—as expected after auto‑orientation—because their absolute scale dwarfs peers on any z‑standardised basis. |
| Bottom scorers (Malta, Cyprus) | Small, nimble economies drop to the negative tail once scale is the reference point, even if they boast strong FDI‑to‑GDP ratios or high growth. |
Take‑away.
market_score cleanly differentiates “sheer market mass” from “compact but dynamic” economies, while its magnitude measures overall heft. The automatic sign rule guarantees that “higher = bigger opportunity” is consistent across all pillars.
Sanity check: After flipping the sign, mega-markets now top the list (China, United States), while small FDI-dependent economies (Malta, Cyprus) sit at the bottom.
This confirms the market_score orientation: higher values correspond to larger, higher-capacity markets.
5.2 Pillar 2 · Cost Competitiveness¶
Goal. Fuse the key cost‑pressure indicators into a single cost_score whose higher values flag cheaper, more cost‑advantaged markets.
Inputs
| Indicator | Why it matters | Direction we want |
|---|---|---|
wage_usd |
Direct labour cost. | ↓ cheaper = better |
inflation_pct |
High inflation erodes real wages and adds price instability. | ↓ lower = better |
manufacturing_pct_gdp |
A large industrial base can suppress marginal costs via supplier density. | ↑ higher = better |
Method
- Pull the z‑scored columns from
scaled_df. - Invert
inflation_pct(multiply by –1) so that lower inflation now registers as “higher is better.” - Run a one‑component PCA.
- Auto‑orient the component so the loading on
wage_usdis negative (i.e., lower wages boost the score). - Append
cost_scoreto bothmaster_dfandscaled_df, and print diagnostics.
# --- PCA · Pillar 2 : Cost Competitiveness ----------------------------------
print("— PCA on Pillar 2: Cost Competitiveness —")
# 1️⃣ Feature bundle
pillar2_features = ["wage_usd", "inflation_pct", "manufacturing_pct_gdp"]
# 2️⃣ Pull z‑scores and invert inflation so lower = better
df_p2 = scaled_df[pillar2_features].copy()
df_p2["inflation_pct"] *= -1 # <-- key inversion
# 3️⃣ Fit PCA (one component)
pca_p2 = PCA(n_components=1, random_state=42)
pc1_raw = pca_p2.fit_transform(df_p2).ravel()
# 4️⃣ Auto‑orient so wage loading is NEGATIVE (low wages ↑ score)
wage_loading = pca_p2.components_[0][pillar2_features.index("wage_usd")]
orient_factor = -np.sign(wage_loading) or 1
cost_score = pc1_raw * orient_factor
loadings_oriented = pd.Series(
pca_p2.components_[0] * orient_factor,
index=pillar2_features,
name="loading"
)
# 5️⃣ Append to dataframes
for df in (master_df, scaled_df):
df["cost_score"] = cost_score
# 6️⃣ Diagnostics
expl_var = pca_p2.explained_variance_ratio_[0]
print(f"✅ PCA complete — PC‑1 captures {expl_var:.1%} of pillar variance.\n")
display(loadings_oriented.to_frame().style.format("{:+0.2f}"))
print("\nTop / bottom cost‑advantage scores (higher = cheaper):")
display(master_df[["country", "year", "cost_score"]]
.sort_values("cost_score", ascending=False).head())
display(master_df[["country", "year", "cost_score"]]
.sort_values("cost_score", ascending=True).head())
— PCA on Pillar 2: Cost Competitiveness — ✅ PCA complete — PC‑1 captures 42.7% of pillar variance.
| loading | |
|---|---|
| wage_usd | -0.70 |
| inflation_pct | -0.58 |
| manufacturing_pct_gdp | +0.42 |
Top / bottom cost‑advantage scores (higher = cheaper):
| country | year | cost_score | |
|---|---|---|---|
| 946 | Turkiye | 2022 | 8.76 |
| 892 | Sri Lanka | 2022 | 6.54 |
| 947 | Turkiye | 2023 | 6.27 |
| 953 | Ukraine | 2015 | 5.58 |
| 285 | Egypt, Arab Rep. | 2023 | 4.37 |
| country | year | cost_score | |
|---|---|---|---|
| 578 | Luxembourg | 2020 | -2.17 |
| 574 | Luxembourg | 2016 | -2.10 |
| 579 | Luxembourg | 2021 | -2.04 |
| 571 | Luxembourg | 2013 | -2.04 |
| 577 | Luxembourg | 2019 | -2.04 |
Interpreting the Cost Competitiveness PCA¶
| Diagnostic | Insight |
|---|---|
| Explained variance | PC‑1 captures ~43 % of variance across the three inputs—acceptable compression given their mixed economic nature. |
| Loadings (after orientation) | • wage_usd carries a negative loading, so lower wages lift the score.• Inverted inflation_pct now shows a positive loading, rewarding low‑inflation environments.• manufacturing_pct_gdp is also positive, reflecting economies of scale from a broad industrial base. |
| Top scorers | Price‑competitive, low‑wage markets with moderate inflation and a sizeable manufacturing share (e.g., Türkiye, Sri Lanka). |
| Bottom scorers | High‑income, high‑wage jurisdictions (e.g., Luxembourg) that may offer stability but little direct cost advantage. |
Take‑away.
cost_score is now directionally coherent: low wages, low inflation, and industrial depth all push a country upward. This makes the pillar directly comparable—and immediately intuitive—when we blend it with the other pillars in the final attractiveness index.
5.3 Pillar 3 · Supply‑Chain & Manufacturing Readiness¶
Goal. Assemble a metric that signals how well a country can host an EV‑battery gigafactory—balancing logistics, industrial depth, and in‑country minerals.
Candidate variables
| Dimension | Variables | Why important | Desired direction |
|---|---|---|---|
| Logistics | lpi_score |
Port, customs & freight reliability | ↑ better |
| Industrial base | industry_pct_gdp, access_to_electricity_pct |
Scale of manufacturing and grid reach | ↑ better |
| Critical minerals | cobalt,_mine, graphite, lithium_minerals, manganese_ore, nickel,_mine |
Local supply of battery inputs | ↑ better |
All inputs are already log‑adjusted (where needed) and z‑scored in scaled_df.
We start with a single‑shot PCA to test whether one component can credibly summarise the pillar.
# --- Initial PCA : "one score to rule them all" -----------------------------
print("— Initial PCA on Pillar 3 —")
core_feats = ["lpi_score", "industry_pct_gdp", "access_to_electricity_pct"]
mineral_cols = ["cobalt,_mine", "graphite", "lithium_minerals",
"manganese_ore", "nickel,_mine"]
pillar3_features = core_feats + mineral_cols
df_p3 = scaled_df[pillar3_features]
pca_p3 = PCA(n_components=1, random_state=42)
pc1_raw = pca_p3.fit_transform(df_p3).ravel()
# Orient so higher LPI ⇒ higher score
orient = np.sign(pca_p3.components_[0][pillar3_features.index("lpi_score")])
supply_raw = pc1_raw * orient
expl_var = pca_p3.explained_variance_ratio_[0]
loadings = (pd.Series(pca_p3.components_[0] * orient,
index=pillar3_features, name="loading")
.sort_values(key=abs, ascending=False))
print(f"✅ PC‑1 captures {expl_var:.1%} of pillar variance\n")
display(loadings.to_frame().style.format("{:+0.2f}"))
— Initial PCA on Pillar 3 — ✅ PC‑1 captures 32.2% of pillar variance
| loading | |
|---|---|
| cobalt,_mine | +0.53 |
| nickel,_mine | +0.49 |
| graphite | +0.40 |
| lithium_minerals | +0.38 |
| manganese_ore | +0.36 |
| industry_pct_gdp | +0.18 |
| lpi_score | +0.07 |
| access_to_electricity_pct | +0.05 |
Diagnostic — Why “One‑Score” PCA Falls Short¶
- Explained variance only 32 % — well below the 40 % threshold we set for a defensible one‑number summary.
- Loadings skewed toward minerals — the five ore‑tonnage columns swamp logistics and industrial capacity.
- Interpretability risk — executives could wrongly equate “good supply chain” with “just dig more nickel”.
Pivot.
We split the pillar into two steps:
- Build a dedicated
mineral_index(PCA on the five ore variables). - Combine that single mineral factor with
lpi_scoreandindustry_pct_gdpin a balanced PCA—or, if variance is still dominated, keep the three items separate.
# --- Step 1 · Mineral Abundance Index ---------------------------------------
pca_minerals = PCA(n_components=1, random_state=42)
mineral_index = pca_minerals.fit_transform(scaled_df[mineral_cols]).ravel()
# Append raw index (we'll z‑score later)
for df in (scaled_df, master_df):
df["mineral_index_raw"] = mineral_index
# --- Step 2 · Balanced PCA with 4 features ----------------------------------
supply_feats = ["lpi_score", "industry_pct_gdp", "access_to_electricity_pct",
"mineral_index_raw"]
df_supply = scaled_df[supply_feats]
pca_supply = PCA(n_components=1, random_state=42)
pc1_supply = pca_supply.fit_transform(df_supply).ravel()
orient = np.sign(pca_supply.components_[0][supply_feats.index("lpi_score")])
supply_score = pc1_supply * orient
expl_var2 = pca_supply.explained_variance_ratio_[0]
loadings2 = (pd.Series(pca_supply.components_[0] * orient,
index=supply_feats, name="loading")
.sort_values(key=abs, ascending=False))
print(f"✅ Re‑run PCA — PC‑1 now captures {expl_var2:.1%} of variance\n")
display(loadings2.to_frame().style.format('{:+0.2f}'))
✅ Re‑run PCA — PC‑1 now captures 46.8% of variance
| loading | |
|---|---|
| mineral_index_raw | +0.98 |
| industry_pct_gdp | +0.18 |
| lpi_score | +0.07 |
| access_to_electricity_pct | +0.05 |
Revised Findings¶
- Variance ↑ to 46.8 % — acceptable, but
mineral_indexstill dwarfs the other drivers. - Loadings show
mineral_index(+0.98) dominates; logistics and industrial depth make only modest contributions.
Final design choice
Because minerals will dominate any composite, we keep the dimensions separate for maximum transparency:
mineral_index— summarises ore abundance.lpi_score— logistics quality.industry_pct_gdp— manufacturing depth.
These three z‑scored metrics form Pillar 3’s feature trio in the final clustering and attractiveness index.
# --- Finalise Pillar‑3 features ---------------------------------------------
from sklearn.preprocessing import StandardScaler
# 1. Z‑score the raw mineral index so scale matches earlier features
scaler_min = StandardScaler()
scaled_df["mineral_index"] = scaler_min.fit_transform(
scaled_df[["mineral_index_raw"]]
)
master_df["mineral_index"] = scaled_df["mineral_index"]
# 2. Assemble convenience DataFrame for downstream steps
pillar3_final = scaled_df[["mineral_index", "lpi_score", "industry_pct_gdp"]].copy()
pillar3_final.head()
| mineral_index | lpi_score | industry_pct_gdp | |
|---|---|---|---|
| 0 | 0.01 | -1.47 | -0.12 |
| 1 | 0.10 | -1.51 | -0.55 |
| 2 | 0.07 | -1.56 | -0.63 |
| 3 | 0.09 | -1.32 | -0.74 |
| 4 | 0.07 | -1.09 | -0.28 |
5.4 Pillar 4 · Governance & Geopolitical Risk¶
Objective. Compress governance strength and conflict intensity into a single risk_score where higher = safer.
| Indicator | Raw meaning | Desired direction | Prep step |
|---|---|---|---|
political_stability_est |
Likelihood of upheaval | ↑ safer | none |
control_of_corruption_est |
Integrity of public sector | ↑ safer | none |
rule_of_law_est |
Contract & property security | ↑ safer | none |
total_disorder_events |
Protest / violence count | ↑ risk | invert (× –1) so “less conflict” → higher value |
All four columns are already z‑scored in scaled_df; we simply invert the conflict variable, run a one‑component PCA, and auto‑orient so that stronger governance loads positively on the index.
# --- PCA · Pillar 4 : Governance & Risk ------------------------------------
print("— PCA on Pillar 4: Governance & Risk —")
pillar4_feats = ["political_stability_est", "control_of_corruption_est",
"rule_of_law_est", "total_disorder_events"]
# 1️⃣ Copy z‑scores and invert the conflict metric so higher = safer
df_p4 = scaled_df[pillar4_feats].copy()
df_p4["total_disorder_events"] *= -1
# 2️⃣ Fit PCA (one component)
pca_p4 = PCA(n_components=1, random_state=42)
pc1_raw = pca_p4.fit_transform(df_p4).ravel()
# 3️⃣ Auto‑orient so the aggregate governance loading is POSITIVE
gov_load_sum = pca_p4.components_[0][:3].sum() # first three vars are governance
orient_factor = np.sign(gov_load_sum) or 1
risk_score = pc1_raw * orient_factor
loadings = pd.Series(
pca_p4.components_[0] * orient_factor,
index=pillar4_feats, name="loading"
)
# 4️⃣ Append to dataframes
for df in (master_df, scaled_df):
df["risk_score"] = risk_score
# 5️⃣ Diagnostics
expl_var = pca_p4.explained_variance_ratio_[0]
print(f"✅ PC‑1 captures {expl_var:.1%} of variance\n")
display(loadings.to_frame().style.format('{:+0.2f}'))
print("\nTop / bottom risk scores (higher = safer):")
display(master_df[["country", "year", "risk_score"]]
.sort_values("risk_score", ascending=False).head())
display(master_df[["country", "year", "risk_score"]]
.sort_values("risk_score", ascending=True).head())
— PCA on Pillar 4: Governance & Risk — ✅ PC‑1 captures 69.8% of variance
| loading | |
|---|---|
| political_stability_est | +0.53 |
| control_of_corruption_est | +0.57 |
| rule_of_law_est | +0.57 |
| total_disorder_events | +0.26 |
Top / bottom risk scores (higher = safer):
| country | year | risk_score | |
|---|---|---|---|
| 679 | New Zealand | 2014 | 2.99 |
| 680 | New Zealand | 2015 | 2.99 |
| 682 | New Zealand | 2017 | 2.96 |
| 681 | New Zealand | 2016 | 2.96 |
| 691 | Norway | 2013 | 2.92 |
| country | year | risk_score | |
|---|---|---|---|
| 703 | Pakistan | 2011 | -4.12 |
| 704 | Pakistan | 2012 | -4.05 |
| 702 | Pakistan | 2010 | -3.96 |
| 705 | Pakistan | 2013 | -3.92 |
| 708 | Pakistan | 2016 | -3.75 |
Interpreting the Governance & Geopolitical Risk PCA¶
| Diagnostic | Insight |
|---|---|
| Explained variance ≈ 70 % | One component captures the lion’s share of variability—strong evidence of a common “stability” axis. |
| Loadings | • Governance trio ( rule_of_law_est, control_of_corruption_est, political_stability_est) all load positively and heavily (≈ +0.55).• Inverted conflict metric ( total_disorder_events) carries a positive loading (≈ +0.26) now that fewer events are “safer.” |
| Top scorers | New Zealand, Norway, and peers—robust institutions, negligible unrest. |
| Bottom scorers | Pakistan (early‑2010s) and similar—high protest/violence counts and weak governance. |
Take‑away.
risk_score is now monotonic and intuitive: stronger institutions and calmer streets push the index up; instability pushes it down. At ~70 % variance explained, this single metric is defensible for both clustering and the final attractiveness index.
6 · Constructing the Gigafactory Attractiveness Index¶
We now blend the six “higher‑is‑better” pillars into a single score.
To keep any one pillar from dominating just because its variance is larger, each input is re‑standardised (mean 0, σ 1) before weighting.
| Pillar input | Weight | Strategic rationale |
|---|---|---|
market_score |
25 % | Revenue upside from sheer market scale |
risk_score |
25 % | Political & institutional safety is non‑negotiable |
cost_score |
15 % | Sustained cost advantage matters |
mineral_index |
15 % | In‑country mineral supply cuts input risk |
lpi_score |
15 % | Efficient logistics enable just‑in‑time production |
industry_pct_gdp |
5 % | Existing industrial ecosystem deepens the talent/supplier pool |
Weights sum to 100 %.
The first cell below calculates the weighted index and shows a distribution snapshot; the second cell ranks countries by their 2010‑23 average.
# --- Build Gigafactory Attractiveness Index ---------------------------------
pillar_cols = ["market_score", "risk_score", "cost_score",
"mineral_index", "lpi_score", "industry_pct_gdp"]
weights = pd.Series({
"market_score": 0.25,
"risk_score": 0.25,
"cost_score": 0.15,
"mineral_index": 0.15,
"lpi_score": 0.15,
"industry_pct_gdp": 0.05
}, name="weight")
assert abs(weights.sum() - 1.0) < 1e-6, "Weights must sum to 1."
# 1️⃣ Re‑standardise each pillar input
scaler_tmp = StandardScaler()
z_pillars = pd.DataFrame(
scaler_tmp.fit_transform(master_df[pillar_cols]),
columns=pillar_cols,
index=master_df.index
)
# 2️⃣ Weighted sum (alignment by column name)
master_df["attractiveness_index"] = z_pillars.mul(weights).sum(axis=1)
print("✅ Attractiveness Index calculated.\nDistribution snapshot:")
display(master_df["attractiveness_index"].describe(percentiles=[.1, .5, .9]))
✅ Attractiveness Index calculated. Distribution snapshot:
count 1,015.00 mean -0.00 std 0.53 min -1.10 10% -0.69 50% 0.02 90% 0.62 max 1.51 Name: attractiveness_index, dtype: float64
# --- Top‑10 countries by average Attractiveness Index -----------------------
avg_ranking = (master_df
.groupby("country", as_index=False)
.agg(avg_index=("attractiveness_index", "mean"))
.sort_values("avg_index", ascending=False)
.reset_index(drop=True))
print("🔝 Ten most attractive countries (mean 2010‑23):")
display(avg_ranking.head(10))
🔝 Ten most attractive countries (mean 2010‑23):
| country | avg_index | |
|---|---|---|
| 0 | China | 1.37 |
| 1 | Australia | 0.97 |
| 2 | Canada | 0.91 |
| 3 | United States | 0.87 |
| 4 | Germany | 0.81 |
| 5 | Japan | 0.74 |
| 6 | Finland | 0.63 |
| 7 | Norway | 0.59 |
| 8 | Sweden | 0.59 |
| 9 | Brazil | 0.56 |
7 · Segment Countries, Then Plot the Recommendation Matrix¶
Colour‑coding the 2 × 2 by data‑driven clusters answers two questions at once:
- Which peer group does each country belong to (safe‑but‑pricey, risky‑but‑cheap, etc.)?
- Are there outliers that defy their peer group and deserve a closer look?
Feature set used for clustering – the six z‑scored pillars/sub‑pillars:
market_score • cost_score • mineral_index • lpi_score • industry_pct_gdp • risk_score
# ─── Optimal‑k dashboard: Elbow • Silhouette • Calinski‑Harabasz • Davies‑Bouldin ──
import matplotlib.pyplot as plt
plt.rcParams["font.family"] = "Arial Unicode MS" # or any full-Unicode font
# --------- config -----------------------------------------------------------
max_k = 10 # test k = 2 … max_k
n_init = 10 # stabilise results
random_state = 42
feature_cols = ["market_score","cost_score","mineral_index",
"lpi_score","industry_pct_gdp","risk_score"]
X = master_df[feature_cols].apply(lambda c: (c - c.mean())/c.std())
# --------- loop over k ------------------------------------------------------
ks, inertia, sil, ch, db = [], [], [], [], []
for k in range(2, max_k+1):
km = KMeans(n_clusters=k, random_state=random_state, n_init=n_init)
labels = km.fit_predict(X)
ks.append(k)
inertia.append(km.inertia_)
sil.append(silhouette_score(X, labels))
ch.append(calinski_harabasz_score(X, labels))
db.append(davies_bouldin_score(X, labels))
# --------- plot dashboard ---------------------------------------------------
fig, axes = plt.subplots(2, 2, figsize=(10, 7))
axes = axes.ravel()
axes[0].plot(ks, inertia, 'o-'); axes[0].set_title("Elbow: Inertia ↓"); axes[0].set_xlabel("k")
axes[1].plot(ks, sil, 'o--', color='tab:red'); axes[1].set_title("Silhouette ↑"); axes[1].set_xlabel("k")
axes[2].plot(ks, ch, 'o-', color='tab:green'); axes[2].set_title("Calinski‑Harabasz ↑"); axes[2].set_xlabel("k")
axes[3].plot(ks, db, 'o--', color='tab:purple'); axes[3].set_title("Davies‑Bouldin ↓"); axes[3].set_xlabel("k")
for ax in axes: ax.grid(alpha=0.3)
plt.suptitle("Optimal‑k Diagnostics", y=1.02, fontsize=14)
plt.tight_layout(); plt.show()
# --------- print metric‑wise suggestions ------------------------------------
def arg_extreme(arr, mode="max"):
return ks[int(np.argmax(arr))] if mode=="max" else ks[int(np.argmin(arr))]
print("Suggested k by metric:")
print(f"• Silhouette peak ............ k = {arg_extreme(sil, 'max')}")
print(f"• Calinski‑Harabasz peak ..... k = {arg_extreme(ch, 'max')}")
print(f"• Davies‑Bouldin minimum ..... k = {arg_extreme(db, 'min')}")
print()
# --------- consensus heuristic ----------------------------------------------
votes = pd.Series([arg_extreme(sil,'max'),
arg_extreme(ch,'max'),
arg_extreme(db,'min')]).value_counts()
consensus_k = votes.idxmax()
print(f"Consensus suggestion (mode of three metrics) → k = {consensus_k}")
Suggested k by metric: • Silhouette peak ............ k = 2 • Calinski‑Harabasz peak ..... k = 2 • Davies‑Bouldin minimum ..... k = 9 Consensus suggestion (mode of three metrics) → k = 2
Outcome¶
- Silhouette and Calinski‑Harabasz both peak at
k = 2 - Inertia exhibits a clear elbow between 2 and 3; adding a third cluster yields only marginal separation.
We therefore proceed with k = 2:
1) keeps the solution statistically clean, and
2) delivers an easy “invest‑now vs. watch‑list” narrative for executives.
# --- 7.2 Fit K‑Means (k = 2) • label personas • build profile -------------
from sklearn.cluster import KMeans
import pandas as pd, numpy as np
k_final = 2
kmeans = KMeans(n_clusters=k_final, random_state=42, n_init=10)
master_df["cluster_id"] = kmeans.fit_predict(X)
# ----- map numeric IDs → business personas via centroid logic --------------
cent = pd.DataFrame(kmeans.cluster_centers_, columns=feature_cols)
safe_id = cent["risk_score"].idxmax() # safest centroid
frontier_id = 1 - safe_id
label_map = {safe_id: "Safe Mature Hubs",
frontier_id:"Risk‑Weighted Frontiers"}
palette = {"Safe Mature Hubs":"#007E8C",
"Risk‑Weighted Frontiers":"#E67800"}
master_df["cluster_label"] = master_df["cluster_id"].map(label_map)
# ----- z‑score profile ------------------------------------------------------
profile = (pd.concat([X, master_df["cluster_label"]], axis=1)
.groupby("cluster_label")[feature_cols]
.mean().round(2)
.assign(count=master_df["cluster_label"].value_counts()))
display(profile.style.format("{:+.2f}").set_caption("Cluster profile • z‑scores (k = 2)"))
| market_score | cost_score | mineral_index | lpi_score | industry_pct_gdp | risk_score | count | |
|---|---|---|---|---|---|---|---|
| cluster_label | |||||||
| Risk‑Weighted Frontiers | -0.18 | +0.58 | +0.07 | -0.67 | +0.37 | -0.69 | +575.00 |
| Safe Mature Hubs | +0.23 | -0.76 | -0.10 | +0.87 | -0.48 | +0.90 | +440.00 |
Safe Mature Hubs → higher Risk‑score (+0.90), lower Cost (‑0.76), strong Logistics (+0.87)
Risk‑Weighted Frontiers → cheaper labour (+0.58), lower governance (‑0.69), modest market scale
7.2 · Sense‑Check of the 2‑Cluster Solution¶
| Cluster | Key z‑score signature | Business persona | Obs. |
|---|---|---|---|
| Safe Mature Hubs | Risk ↑↑ • Logistics ↑ • Cost ↓↓ | Large, institutionally safe but high‑wage (US, Germany, Japan, Canada, Australia) | 575 |
| Risk‑Weighted Frontiers | Cost ↑ • Risk ↓↓ | Lower‑cost markets that need governance wraps (Indonesia, India, Brazil) | 440 |
# --- 7.3 Observation‑level 2×2 (k = 2) ------------------------------------
import plotly.express as px
bubble = (master_df["market_score"] - master_df["market_score"].min() + 0.1)\
.clip(upper=master_df["market_score"].quantile(.95))
fig = px.scatter(
master_df, x="attractiveness_index", y="risk_score",
color="cluster_label", size=bubble,
color_discrete_map=palette,
category_orders={"cluster_label": list(palette)},
hover_name="country", hover_data=["year"],
labels={"attractiveness_index":"Attractiveness (↑ better)",
"risk_score":"Risk (↑ safer)"},
template="plotly_white",
title="Gigafactory 2×2 — Attractiveness vs Risk<br>"
"<sup>Teal = Safe Mature Hubs • Amber = Risk‑Weighted Frontiers • Bubble = market scale</sup>",
width=950, height=560)
fig.add_vline(master_df["attractiveness_index"].median(), line_dash="dot", line_color="gray")
fig.add_hline(master_df["risk_score"].median(), line_dash="dot", line_color="gray")
fig.update_layout(title_x=0.5, legend_title_text="Cluster"); fig.show()
How to read this chart
- Teal (Safe Mature Hubs) dominate the upper‑right quadrant → launch‑today candidates: good upside and governance.
- Amber (Risk‑Weighted Frontiers) spread across right‑but‑lower‑risk and left‑side quadrants → high growth and/or mineral upside, but governance mitigations (JV, PRI cover) required.
- Bubble size continues to show relative market scale within each colour band.
# --- 7.4 Country‑average 2×2 (k = 2) -------------------------------------
country_avg = (master_df
.groupby("country", as_index=False)
.agg(mean_attr=("attractiveness_index","mean"),
mean_risk=("risk_score","mean"),
mean_market=("market_score","mean"),
cluster_label=("cluster_label", lambda s: s.mode()[0])))
bubble_c = (country_avg["mean_market"] - country_avg["mean_market"].min() + 0.1)\
.clip(upper=country_avg["mean_market"].quantile(.95))
fig = px.scatter(
country_avg, x="mean_attr", y="mean_risk",
color="cluster_label", size=bubble_c,
color_discrete_map=palette,
hover_name="country",
labels={"mean_attr":"Mean Attractiveness (↑)",
"mean_risk":"Mean Risk (↑ safer)"},
template="plotly_white",
title="Country Portfolio 2×2 (Avg 2010‑23)<br>"
"<sup>Teal = Safe • Amber = Risk‑Weighted</sup>",
width=900, height=540)
fig.add_vline(country_avg["mean_attr"].median(), line_dash="dot", line_color="gray")
fig.add_hline(country_avg["mean_risk"].median(), line_dash="dot", line_color="gray")
fig.update_layout(title_x=0.5, legend_title_text="Cluster"); fig.show()
# --- 7.5 Global cluster map (k = 2) ---------------------------------------
fig = px.choropleth(
country_avg, locations="country", locationmode="country names",
color="cluster_label", color_discrete_map=palette,
hover_data={"mean_attr":":.2f","mean_risk":":.2f"},
title="Strategic Cluster Map • Avg 2010‑23 (Teal = Safe, Amber = Risk‑Weighted)",
template="plotly_white")
fig.update_geos(projection_type="natural earth", showcountries=True,
countrycolor="white", showland=True, landcolor="#F2F2F2",
showocean=True, oceancolor="#E8F7FF")
fig.update_traces(marker_line_color="white", marker_line_width=0.4)
fig.update_layout(title_x=0.5, legend_title_text="Cluster",
width=1200, height=600); fig.show()
8 · Extract the Short‑List — High Opportunity & Low Risk¶
Filter logic
1. Keep countries whose mean Attractiveness and mean Risk exceed the portfolio medians (upper‑right quadrant of the country 2 × 2).
2. Rank the survivors by mean Attractiveness; take the top five.
3. Display cluster, average risk, and market‑scale z‑score for each finalist.
# --- 8.1 Build short‑list --------------------------------------------------
x_med, y_med = country_avg["mean_attr"].median(), country_avg["mean_risk"].median()
shortlist = (country_avg.query("mean_attr >= @x_med and mean_risk >= @y_med")
.sort_values("mean_attr", ascending=False)
.head(5)
.loc[:, ["country", "cluster_label",
"mean_attr", "mean_risk", "mean_market"]]
.round(2)
.rename(columns={"country": "Country",
"cluster_label": "Cluster",
"mean_attr": "Avg Attractiveness",
"mean_risk": "Avg Risk",
"mean_market": "Market Scale (z)"})
.reset_index(drop=True))
display(shortlist)
| Country | Cluster | Avg Attractiveness | Avg Risk | Market Scale (z) | |
|---|---|---|---|---|---|
| 0 | Australia | Safe Mature Hubs | 0.97 | 2.20 | 1.20 |
| 1 | Canada | Safe Mature Hubs | 0.91 | 2.16 | 1.54 |
| 2 | United States | Safe Mature Hubs | 0.87 | 1.16 | 3.57 |
| 3 | Germany | Safe Mature Hubs | 0.81 | 1.92 | 2.48 |
| 4 | Japan | Safe Mature Hubs | 0.74 | 1.76 | 2.64 |
Short‑List Interpretation — Upper‑Right Quadrant Leaders¶
| Rank | Country | Why it rises to the top |
|---|---|---|
| 1 | Australia | World‑class governance and largest battery‑mineral base among safe markets. |
| 2 | Canada | Comparable governance to Australia, bigger domestic demand, abundant critical minerals. |
| 3 | United States | Vast market and top logistics; higher labour cost offset by scale. |
| 4 | Germany | EU logistics hub with deep supplier ecosystem; governance premium. |
| 5 | Japan | Tech‑savvy, demand‑rich, and politically stable. |
All five fall in the Safe Mature Hubs cluster, offering low execution risk for an initial $500 M gigafactory.
# --- 8.2 Bar chart • attractiveness length, risk colour -------------------
import plotly.express as px
fig = px.bar(
shortlist.sort_values("Avg Attractiveness"),
x="Avg Attractiveness",
y="Country",
orientation="h",
color="Avg Risk",
color_continuous_scale="Greens",
title="Finalist Countries — Attractiveness vs Risk\n(bar length = attractiveness, shade = safety)",
labels={"Avg Attractiveness":"Average Attractiveness (2010‑23)",
"Avg Risk":"Average Risk (2010‑23)"},
template="plotly_white",
width=750, height=350
)
fig.update_layout(title_x=0.5,
coloraxis_colorbar=dict(title="Risk\n(higher = safer)"))
fig.show()
9 · “What‑If?” Sensitivity Check — Do Our Finalists Stay on Top?¶
We stress‑test the ranking under three alternative weighting schemes:
| Scenario | Weight vector (Market • Risk • Cost • Minerals • LPI • Industry) |
|---|---|
| Baseline | 25 % · 25 % · 15 % · 15 % · 15 % · 5 % |
| Risk‑heavy | 15 % · 40 % · 15 % · 10 % · 15 % · 5 % |
| Cost‑heavy | 20 % · 20 % · 30 % · 10 % · 15 % · 5 % |
For each scenario we
1. re‑compute an index from the z‑scored pillars,
2. take the 2010‑23 mean per country, and
3. examine how the five finalists behave across scenarios.
# --- 9.1 Index under three weighting schemes --------------------------------
scenarios = {
"Baseline": [0.25, 0.25, 0.15, 0.15, 0.15, 0.05],
"Risk‑heavy": [0.15, 0.40, 0.15, 0.10, 0.15, 0.05],
"Cost‑heavy": [0.20, 0.20, 0.30, 0.10, 0.15, 0.05]
}
pillar_cols = ["market_score","risk_score","cost_score",
"mineral_index","lpi_score","industry_pct_gdp"]
# 1️⃣ Ensure z‑scored frame (X_df) exists
X_df = master_df[pillar_cols].apply(lambda c: (c - c.mean())/c.std())
# 2️⃣ Compute scenario indices per row
for name, w in scenarios.items():
master_df[f"index_{name}"] = (X_df * w).sum(axis=1)
# 3️⃣ Country‑level means for ranking
country_scores = {name: master_df.groupby("country")[f"index_{name}"].mean()
for name in scenarios}
# 4️⃣ Finalist panel (same five as shortlist)
finalists = shortlist["Country"].tolist()
panel = (pd.DataFrame(country_scores)
.loc[finalists]
.round(2)
.rename_axis("Country"))
display(panel.style.format("{:+.2f}").set_caption("Finalists — Scenario Scores"))
| Baseline | Risk‑heavy | Cost‑heavy | |
|---|---|---|---|
| Country | |||
| Australia | +0.97 | +0.92 | +0.47 |
| Canada | +0.91 | +0.89 | +0.51 |
| United States | +0.87 | +0.69 | +0.50 |
| Germany | +0.81 | +0.84 | +0.58 |
| Japan | +0.74 | +0.76 | +0.55 |
Visual goal¶
- Show the Top‑10 countries under each weighting scheme.
- Colour logic
- Baseline – our five finalists in indigo (
#0B0055) with lightening tints, all others grey. - Risk‑heavy / Cost‑heavy – finalists stay indigo; new entrants (not in baseline top‑10) appear in orange (
#F86302) tints; all others grey.
- Baseline – our five finalists in indigo (
# --- 9.3 Top‑10 charts • finalists indigo • NEW Top‑5 entrants orange ------
# ---------- config ----------------------------------------------------------
finalists = shortlist["Country"].tolist() # 5 baseline finalists
indigo_hex = "#0B0055"
orange_hex = "#F86302"
grey_hex = "#D3D3D3"
def tint(hexcol, idx):
r, g, b = mcolors.hex2color(hexcol)
factor = 1 - 0.15 * idx
return mcolors.to_hex(tuple(1 - (1 - c) * factor for c in (r, g, b)))
# ---------- helper to build coloured Top‑10 df ------------------------------
def prep_df(series, newcomers=None):
df = (series.nlargest(10).to_frame("Score").reset_index())
df.columns = ["Country", "Score"]
colours = []
for c in df["Country"]:
if c in finalists:
colours.append(tint(indigo_hex, finalists.index(c)))
elif newcomers and c in newcomers:
colours.append(tint(orange_hex, newcomers.index(c)))
else:
colours.append(grey_hex)
df["Colour"] = colours
return df.sort_values("Score") # low→high for bottom‑up bars
def make_bar(df, title):
fig = px.bar(df, x="Score", y="Country", orientation="h",
color="Colour", color_discrete_map="identity",
text="Score", template="plotly_white",
labels={"Score":"Index Score","Country":""},
width=520, height=350, title=title)
fig.update_traces(texttemplate="%{text:.2f}", textposition="outside")
fig.update_layout(showlegend=False, bargap=0.3, title_x=0.5)
return fig
# ---------- derive newcomer lists (Top‑5 only) ------------------------------
baseline_key = [k for k in scenarios if k.lower().startswith("base")][0]
risk_key = [k for k in scenarios if k.lower().startswith("risk")][0]
cost_key = [k for k in scenarios if k.lower().startswith("cost")][0]
baseline_top5 = country_scores[baseline_key].nlargest(5).index.tolist()
risk_new = [c for c in country_scores[risk_key].nlargest(5).index
if c not in finalists]
cost_new = [c for c in country_scores[cost_key].nlargest(5).index
if c not in finalists]
# ---------- build & show charts ---------------------------------------------
fig1 = make_bar(prep_df(country_scores[baseline_key]), f"<b>{baseline_key}</b> — Top 10")
fig2 = make_bar(prep_df(country_scores[risk_key], risk_new), f"<b>{risk_key}</b> — Top 10")
fig3 = make_bar(prep_df(country_scores[cost_key], cost_new), f"<b>{cost_key}</b> — Top 10")
fig1.show(); fig2.show(); fig3.show()
Interpreting the weight‑sensitivity Top‑10 panels¶
| Colour key | Meaning |
|---|---|
| Indigo shades | The original five finalists (darker = higher Baseline rank). |
| Orange shades | Countries that appear in the Top‑5 only after the weight shift – darker = higher rank in that scenario. |
| Light grey | All other countries. |
1 · Baseline weighting¶
(25 % Market • 25 % Risk • 15 % each Cost / Minerals / Logistics • 5 % Industry)
- Australia and Canada comfortably hold the top two slots (deep indigo).
- United States, Germany, Japan complete the finalist set within the Top‑5.
- China sits at #6 (long grey bar) – enormous market, but governance keeps it off the finalist list.
2 · Risk‑heavy scenario¶
(Risk weight lifted to 40 %; Market cut to 15 %; Minerals cut to 10 %)
- The five finalists stay inside the Top‑10, led again by Australia and Canada.
- China climbs into the Top‑5 (bright‑orange #4) as its market bulk outweighs the extra governance penalty.
- Finland sneaks into #5 (light‑orange) on the back of a stellar Risk score.
- No other entrants—tilting aggressively toward governance adds only two new contenders beyond the indigo group.
3 · Cost‑heavy scenario¶
(Cost weight raised to 30 %; Risk & Minerals trimmed to 20 % and 10 % respectively)
- China surges to a clear #1 (deep orange) – low cost plus huge market.
- Hungary, Türkiye, Czechia enter the Top‑5 (orange tints) as ultra‑low‑wage, EU‑adjacent locations.
Key take‑aways¶
| Observation | Implication |
|---|---|
| Risk weight at 40 % still fails to dislodge the five finalists. | Their governance advantage remains decisive. |
| China appears in the Top‑5 under both alternates. | Market scale + cost edge overpower the governance drag once other pillars are slightly discounted. |
| Finland only surfaces under Risk‑heavy. | Governance stars with moderate cost can edge in when risk is paramount. |
| Cost‑heavy introduces three CEE/MENA countries. | Low labour cost is the only lever strong enough to displace mature hubs; however, they lack governance and market depth. |
11 · Pillar‑Contribution “Tornado” Charts¶
To translate rank order into actionable insight we decompose each finalist’s Baseline index into weighted pillar contributions.
- Bar length = contribution magnitude (z‑score × weight)
- Label = share of the total index (%)
- Colour (optional) = strategic pillar for quick eye‑tracking
Long bars reveal what makes the country stand out; short or negative bars expose relative weaknesses. One interactive chart is produced per finalist—zoom or export directly from the Plotly toolbar.
# --- 11.1 Tornado charts for each finalist ---------------------------------
import plotly.express as px
import pandas as pd
# 1️⃣ Inputs ---------------------------------------------------------------
finalists = shortlist["Country"].tolist()
pillar_cols = ["market_score","risk_score","cost_score",
"mineral_index","lpi_score","industry_pct_gdp"]
# Baseline weights as a Series aligned to pillar_cols
baseline_w = pd.Series({
"market_score":0.25, "risk_score":0.25, "cost_score":0.15,
"mineral_index":0.15, "lpi_score":0.15, "industry_pct_gdp":0.05
})
# Use the z‑scored dataframe X_df built in Section 9
z_df = master_df[["country"] + pillar_cols].copy()
z_df[pillar_cols] = X_df
# 2️⃣ Build contribution table --------------------------------------------
records = []
for c in finalists:
mean_z = z_df.loc[z_df["country"] == c, pillar_cols].mean()
contrib = mean_z * baseline_w
total = contrib.sum()
for p in pillar_cols:
records.append({
"Country": c,
"Pillar": p.replace("_", " ").title(),
"Contribution": contrib[p],
"Share %": f"{(contrib[p]/total*100):.0f}%"
})
contrib_df = pd.DataFrame(records)
# 3️⃣ Colour palette (toggle colourful=False for monochrome) ---------------
colourful = True
palette = {"Market Score":"#003F5C", "Risk Score":"#BC5090",
"Cost Score":"#FFA600", "Mineral Index":"#58508D",
"Lpi Score":"#2F4B7C", "Industry Pct Gdp":"#FF6361"}
if not colourful:
palette = {k:"#4C78A8" for k in palette} # one colour
# 4️⃣ Plot one tornado per finalist ----------------------------------------
for country in finalists:
df_plot = (contrib_df[contrib_df["Country"] == country]
.sort_values("Contribution"))
fig = px.bar(
df_plot, x="Contribution", y="Pillar", orientation="h",
color="Pillar", color_discrete_map=palette,
text="Share %", template="plotly_white",
title=f"{country} — Baseline Index Breakdown",
labels={"Contribution":"Weighted Contribution","Pillar":""},
width=700, height=420
)
fig.update_traces(textposition="inside")
fig.update_layout(title_x=0.5, bargap=0.35, showlegend=False)
fig.show()
11 · How to read the “Index‑Breakdown” tornado charts¶
Each figure splits one finalist’s Baseline Attractiveness Index into the six weighted pillar contributions:
| Colour & pillar | Strategic meaning |
|---|---|
| Market Score (dark blue) | Demand size & growth potential |
| Risk Score (pink) | Governance & institutional stability |
| Cost Score (orange) | Labour & operating‑cost advantage (‑ = drag) |
| Mineral Index (violet) | In‑country supply of battery‑critical ores |
| LPI Score (steel blue) | Logistics & infrastructure quality |
| Industry % GDP (red) | Depth of existing manufacturing base |
- Positive bars (→ right) boost the total index;
- Negative bars (← left) show where the country is penalised.
- The number inside each bar is that pillar’s percentage share of the total score.
Australia¶
- Mineral Index +54 % and Risk +34 % do the heavy lifting.
- Solid Market +19 % and Logistics +16 % add support.
- Cost –23 % is the single head‑wind.
Take‑away – A minerals‑and‑governance play; higher wages are the price of stability.
Canada¶
- Similar profile: Risk +35 %, Market +26 %, Minerals +38 %.
- Logistics +21 % slightly stronger than Australia.
- Cost –19 % drag is milder.
Take‑away – Balanced low‑risk option with a marginal cost edge but smaller market.
United States¶
- Market +64 % dwarfs all other pillars—demand scale is the story.
- Logistics +22 % and Risk +20 % bolster attractiveness.
- Cost –19 % and Industry –6 % pull the score back.
Take‑away – Sheer market scale offsets cost; industrial share looks low only because the
metric is % of GDP, not absolute value.
Germany¶
- Market +48 % and Risk +35 % account for >80 % of the score.
- World‑class Logistics +33 % is a differentiator.
- Cost –12 % and Minerals –4 % are modest drags.
Take‑away – Classic mature hub: big, safe, hyper‑connected—at a cost premium.
Japan¶
- Market +56 % and Risk +36 % dominate.
- Logistics +29 % supports export reliability.
- Cost –11 % and Minerals –11 % are the main weaknesses.
Take‑away – Huge, tech‑savvy market; high costs and limited domestic ore must be mitigated.
Cross‑country insights¶
| Observation | Strategic implication |
|---|---|
| Risk Score is a top‑two driver for every finalist. | Governance stability is non‑negotiable under Baseline weights. |
| Cost Score is negative for all five. | Management willingly pays a wage premium for safe, advanced locations. |
| Mineral Index splits the field. | Australia & Canada gain a decisive boost; Germany & Japan rely on other pillars. |
| Market vs. Minerals trade‑off. | US & Japan ride market scale; Australia & Canada ride minerals; Germany balances both. |
| Logistics & Industry share fine‑tune, not decide. | They strengthen high‑ranked countries but rarely rescue low‑ranked ones. |
📊 Finalist Radar — how to use this chart¶
- What it shows – Each loop is a finalist’s z‑score on our six pillars
(
Market · Risk · Cost · Minerals · Logistics · Industry). - Interactivity
- Hover for exact numbers.
- Click legend items to hide/show a single country.
- Use the buttons top‑right to switch instantly between
Mineral Powerhouses (Australia + Canada) and Market Titans (US + Japan).
- Colour‑blind palette – five high‑contrast colours that remain distinguishable under deuteranopia/protanopia simulations.
Screenshots are fine for slides, but the exported
shortlist_radar.png(saved automatically) is a 2×‑resolution static image you can embed in GitHub READMEs while the Plotly version stays fully interactive inside Jupyter or the HTML export.
# --- Finalist radar with comparison buttons (robust version) ---------------
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# 1️⃣ Pillar columns & labels
pillar_cols = ["market_score","risk_score","cost_score",
"mineral_index","lpi_score","industry_pct_gdp"]
labels = [p.replace("_"," ").title() for p in pillar_cols]
# 2️⃣ Build z‑score matrix if absent
if "z_mean" not in globals():
# standardise pillars column‑wise (μ=0, σ=1)
X_df = (master_df[pillar_cols] - master_df[pillar_cols].mean()) / master_df[pillar_cols].std(ddof=0)
z_mean = X_df.assign(country=master_df["country"]).groupby("country")[pillar_cols].mean()
# 3️⃣ Assemble radar dataframe for the five shortlisted countries
finals = shortlist["Country"].tolist()
radar_df = (z_mean.loc[finals, pillar_cols]
.reset_index()
.rename(columns={"index":"Country"}))
# 4️⃣ Colour‑blind‑safe palette (Wong)
cb_palette = {
"Australia" : "#0072B2", # blue
"Canada" : "#009E73", # green
"United States" : "#D55E00", # vermilion
"Germany" : "#CC79A7", # purple
"Japan" : "#E69F00" # orange
}
minerals = ["Australia", "Canada"]
markets = ["United States", "Japan"]
# 5️⃣ Radar figure
fig = go.Figure()
for _, row in radar_df.iterrows():
c = row["country"]
fig.add_trace(go.Scatterpolar(
r=row[pillar_cols].tolist() + [row[pillar_cols[0]]], # close loop
theta=labels + [labels[0]],
name=c,
line=dict(color=cb_palette[c], width=2),
fill='toself',
opacity=0.35
))
# visibility masks
vis_all = [True]*len(radar_df)
vis_minerals = [c in minerals for c in radar_df["country"]]
vis_markets = [c in markets for c in radar_df["country"]]
# 6️⃣ Layout & buttons
fig.update_layout(
title=dict(text="Finalist Radar — Six‑Pillar Strength Profile (z‑scores)",
x=0.5, y=0.95),
margin=dict(t=110),
polar=dict(
radialaxis=dict(visible=True, range=[-1.5, 2], tickangle=45),
angularaxis=dict(rotation=90, direction="clockwise")
),
template="plotly_white",
width=700, height=700,
legend=dict(orientation="h", y=-0.14, x=0.5, xanchor="center"),
updatemenus=[
dict(
type="buttons",
direction="left",
x=0.5, xanchor="center",
y=1.07, yanchor="top",
buttons=[
dict(label="All finalists",
method="update",
args=[{"visible": vis_all}]),
dict(label="Mineral powerhouses",
method="update",
args=[{"visible": vis_minerals}]),
dict(label="Market titans",
method="update",
args=[{"visible": vis_markets}])
],
showactive=True,
pad={"r": 10, "t": 0}
)
]
)
fig.show()
# 7️⃣ Optional PNG for README
try:
import kaleido # noqa
fig.write_image("shortlist_radar.png", scale=2)
print("✅ PNG saved → shortlist_radar.png (2× resolution)")
except ModuleNotFoundError:
print("ℹ️ PNG not saved — install kaleido (`pip install kaleido`) to enable static export.")
/var/folders/hk/bpwckgf105v0w3crb2512kvw0000gn/T/ipykernel_97199/2096745613.py:93: DeprecationWarning: Support for Kaleido versions less than 1.0.0 is deprecated and will be removed after September 2025. Please upgrade Kaleido to version 1.0.0 or greater (`pip install 'kaleido>=1.0.0'` or `pip install 'plotly[kaleido]'`).
✅ PNG saved → shortlist_radar.png (2× resolution)
📡 How to read the Finalist Radar¶
The radar chart visualises where each finalist scores above (or below) the global mean on our six pillars. All axes are z‑scores (0 = world average).
| Axis | What “further out” means |
|---|---|
| Market Score | Larger EV demand & growth |
| Risk Score | Safer governance, lower policy volatility |
| Cost Score | Negative is expensive labour/energy; positive is cheaper |
| Mineral Index | Abundant in‑country lithium, nickel, cobalt |
| LPI Score | Superior logistics & infrastructure |
| Industry % GDP | Bigger manufacturing base, deeper supply web |
- Coloured loops are filled to 35 % opacity so overlap is visible.
- Click legend items to isolate a single country.
Snapshot insights¶
| Country | Stands‑out for | Noticeable weaknesses |
|---|---|---|
| Australia | 🟢 Mineral bounty (extends farthest on Mineral Index) | 🔴 Cost penalty (negative Cost Score) |
| Canada | 🟢 Governance & Mineral balance (Risk + Mineral both > 1 σ) | 🔴 Smaller Industry % GDP slice |
| United States | 🟢 Huge Market spike | 🔴 Cost drag and thin Minerals wedge |
| Germany | 🟢 World‑class Logistics & Risk combo | 🔴 Expensive Cost, low Minerals |
| Japan | 🟢 Large Market & solid Risk | 🔴 Mineral deficit and Cost head‑wind |
What to take away¶
- No single country dominates every axis – the shortlist is diversified by strength profile.
- Cost is negative for all five (left‑hand pull), confirming that management prioritises governance and market scale over raw wage savings.
- Strategy implication: pair a minerals‑strong site (Australia/Canada) with a market‑strong site (US/Japan) to hedge supply‑chain and demand risks.
12 · Interactive Weight-Sensitivity Sandbox¶
Fine-tune the Gigafactory Attractiveness Index on-the-fly and watch the ranking reshuffle in real time.
- What it does
- Creates six sliders—one for each pillar weight (Market, Risk, Cost, Minerals, LPI, Industry).
- Sliders always renormalise to 100 % total weight, so you can focus on relative importance rather than arithmetic.
- Every time a slider moves the notebook:
- Recomputes a fresh index for all 80 countries using the new weights.
- Displays an updated Top-10 table (2010-23 average).
- Plots a live 2 × 2 scatter (Index ✕ Risk) so you see where candidates migrate on the board.
- How to use it
- Drag weights to reflect a “what-if” strategic stance—e.g., push Cost to 30 % if the board demands ultra-low opex.
- Observe which countries light up or drop out; sanity-check against earlier qualitative flags.
- Screenshot the configuration that best aligns with stakeholder priorities for slide-ready evidence.
Tip: A weight going to 0 % effectively removes that pillar—handy for stress-testing single-factor dominance.
Run the cell below, then start experimenting.
# --- 13 · Executive playground — live re‑weighting --------------------------
import ipywidgets as wd
import plotly.express as px
import pandas as pd
from IPython.display import display, clear_output
# ╭─ 1. widgets ───────────────────────────────────────────────────────────╮
sliders = {
"Market": wd.FloatSlider(value=0.25, min=0, max=1, step=0.05, description="Market"),
"Risk": wd.FloatSlider(value=0.25, min=0, max=1, step=0.05, description="Risk"),
"Cost": wd.FloatSlider(value=0.15, min=0, max=1, step=0.05, description="Cost"),
"Minerals": wd.FloatSlider(value=0.15, min=0, max=1, step=0.05, description="Minerals"),
"LPI": wd.FloatSlider(value=0.15, min=0, max=1, step=0.05, description="LPI"),
"Industry": wd.FloatSlider(value=0.05, min=0, max=1, step=0.05, description="Industry")
}
ui = wd.VBox(list(sliders.values()))
out = wd.Output()
display(wd.HBox([ui, out])) # side‑by‑side layout
# ╭─ 2. data prep ──────────────────────────────────────────────────────────╮
pillar_cols = ["market_score","risk_score","cost_score",
"mineral_index","lpi_score","industry_pct_gdp"]
# Z‑scores at country‑average level
z_country = master_df[["country"] + pillar_cols].copy()
z_country[pillar_cols] = X_df
z_mean = z_country.groupby("country")[pillar_cols].mean()
# Bubble size helper (always positive)
bubble_src = country_avg.set_index("country")["mean_market"]
bubble_pos = bubble_src - bubble_src.min() + 0.1
# Colour palette (cluster‑aware if available)
palette_default = "#4C78A8"
if "cluster_label" in country_avg.columns:
palette = {
"Safe Mature Hubs" : "#007E8C",
"Risk‑Weighted Frontiers": "#E67800"
}
else:
palette = None
# ╭─ 3. refresh callback ───────────────────────────────────────────────────╮
def refresh(*_):
# normalise weights
w_raw = {k: s.value for k, s in sliders.items()}
total = sum(w_raw.values()) or 1
weights = {k: v/total for k, v in w_raw.items()}
w_vec = [weights[n] for n in ["Market","Risk","Cost","Minerals","LPI","Industry"]]
# compute new index
scores = (z_mean * w_vec).sum(axis=1).sort_values(ascending=False)
top10 = scores.head(10).round(2).to_frame("Index Score")
# ── weight pie ───────────────────────────────────────────────────────
pie_fig = px.pie(
names=list(weights.keys()), values=list(weights.values()),
title="Current weight split", width=300, height=300,
color_discrete_sequence=px.colors.qualitative.Set3
)
pie_fig.update_layout(title_x=0.5, margin=dict(t=40, l=0, r=0, b=0))
# ── 2×2 scatter ──────────────────────────────────────────────────────
tmp = country_avg.copy().set_index("country")
tmp["index_live"] = scores
tmp = tmp.reset_index()
fig_scatter = px.scatter(
tmp, x="index_live", y="mean_risk",
size=bubble_pos.loc[tmp["country"]],
color="cluster_label" if "cluster_label" in tmp.columns else None,
color_discrete_map=palette,
labels={"index_live":"Live Index","mean_risk":"Mean Risk"},
title="Live 2×2 — Attractiveness vs Risk",
template="plotly_white", width=800, height=500
)
fig_scatter.add_vline(x=tmp["index_live"].median(), line_dash="dot", line_color="gray")
fig_scatter.add_hline(y=tmp["mean_risk"].median(), line_dash="dot", line_color="gray")
fig_scatter.update_layout(title_x=0.5, legend_title_text="Cluster")
# ── render ───────────────────────────────────────────────────────────
with out:
clear_output(wait=True)
display(pie_fig)
display(top10.style.set_caption("Top‑10 ranking (live weights)"))
display(fig_scatter)
# initial draw and wiring
refresh()
for s in sliders.values():
s.observe(refresh, "value")
HBox(children=(VBox(children=(FloatSlider(value=0.25, description='Market', max=1.0, step=0.05), FloatSlider(v…
13 · Recommendations & Next‑Step Workplan¶
13.1 Strategic Recommendation¶
| Rank | Country | Rationale for immediate short‑listing |
|---|---|---|
| 1 | Australia | Unmatched mineral security (+54 % of index) and top‑tier governance. Recommend first wave feasibility study. |
| 2 | Canada | Balanced scorecard: governance, minerals, logistics. Ideal parallel tract to Australia to keep North‑America option open. |
| 3 | United States | Largest addressable EV market; IRA subsidies de‑risk capex. Cost drag acceptable given demand scale. |
| 4 | Germany | EU logistics & automotive hub; generous IPCEI battery incentives. High costs offset by talent and proximity to OEMs. |
| 5 | Japan | Tech‑savvy OEM base and stable governance. High labour cost mitigated by JV potential with local cell makers. |
Recommendation: Advance Australia & Canada as primary site contenders; run the
United States as a strategic hedge; keep Germany & Japan on the long‑list for OEM‑JV or second plant discussions.
13.2 Action Plan (next 60 days)¶
- Board mandate – confirm investment envelope (US $500 M) and risk appetite.
- Country deep‑dives (parallel work‑streams):
- Incentive scouting – engage Austrade, Invest in Canada, SelectUSA, GTAI, JETRO.
- Site shortlist – map brownfield vs. greenfield zones within 100 km of Tier‑1 ports & rail.
- Preliminary JV outreach – battery OEMs / cathode suppliers in each market.
- Site visits – two‑week sprint to top industrial parks in Perth, Quebec, Texas, Saxony, Kyushu.
- Financial model – convert z‑scores into $ / kWh capex & year‑5 OpEx; include IRA, IPCEI, METI grants.
- Risk‑mitigation blueprint – political‑risk insurance (MIGA), FX hedging strategy, supply‑offtake MOUs.
13.3 Secondary‑Research Checklist¶
| Theme | Key data sources | Purpose |
|---|---|---|
| Tax & incentives | PwC Worldwide Tax Summaries, KPMG Taxes & Incentives in Renewable Energy, government investment‑promotion sites | Effective tax rate, R&D credit, property tax holidays, free‑trade zones |
| Customs & tariffs | WTO Tariff Database, UN WITS, UKTR, USTR | Import duties on cathodes/anodes, battery modules, machinery |
| Trade sanctions / export controls | BIS Entity List, EU Sanctions Map, Australian DFAT sanctions list | Ensure no restricted counterparties; dual‑use export licence checks |
| Bilateral & regional FTAs | CPTPP text, CETA, USMCA, EU‑Japan EPA | Confirm preferential tariff pathways for raw materials & battery exports |
| Labour cost & regulation | ILOstat wage data, Mercer Total Remuneration Surveys, OECD employment protection indicators | Five‑year labour cost curve; hire‑fire flexibility |
| Electricity cost & carbon factor | IEA Electricity Market Report, Ember Global Electricity Review, local grid operators | LCOE estimate, Scope‑2 CO₂ for green‑premium modelling |
| Industrial land & utilities | Cushman & Wakefield Global Industrial Guide, local IPAs | Land price, water allocation, grid tie‑in lead‑time |
| Logistics benchmarks | Drewry port throughput, World Bank LPI sub‑pillars, JOC port productivity | Port dwell time, inland freight cost per TEU, customs clearance KPI |
| Political & legal stability | World Bank WGI, Fitch Solutions, Economist Intelligence Unit | Cross‑check tornado “Risk” scores; monitor election cycle events |
| IP protection | WIPO Global Innovation Index, US Chamber IP Index | Safeguard cell chemistry & process IP |
| Environmental permitting | UNEP EnviroRights Map, national EIA statutes | Timeline & stringency of EIA / ESG disclosure |
| Subsidy compliance | EU anti‑subsidy, US CFIUS guidelines, OECD export credits | Avoid reversal risk or national‑security scrutiny |
13.4 Optional Deep‑Dives (if Board requests)¶
- Monte‑Carlo volatility analysis – probability each finalist stays Top‑5 under ±1 σ pillar noise.
- CO₂‑adjusted cost curve – include carbon‑price shadow for 2030.
- Dual‑sourcing feasibility – split cathode supply across Australia‑Canada to derisk geopolitical shocks.
- Time‑zone & headquarters overlap – optimize for real‑time engineering collaboration.
Final call‑out¶
With Australia and Canada leading on both governance and mineral security, management can move to field‑level due diligence confident that no hidden red‑flags (tax, tariff, sanctions, logistics) undermine the macro case. The recommended research streams will convert today’s index superiority into a fully costed, contract‑ready location decision within Q‑next + 2 months.